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Figure 7: 



Mean latencies of correct responses and error rate's in the same- 
different task: V{C)-CV and VC-V stimuli paired with VC stimuli, 
and VC pairs . « 
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[, MANUSCRIPTS AND EXTENDED REPORTS 

Vn Che Di.ssoci<-.tion o£ Spectr.il ond remporal Cuc(. Lo' Che Voicing Distinction 



in IniCial Stop Consonants* 
Ouent in 'summer field and Nark Unggard^ 



. ;• ABSTRACT 

It has' been claimed Chat a rising firsC-formant (Fj) Cransi- 
Cion is an imporcanc cue Co Che voiced-voiceless disCincCion. for 
syllable-inicial, prescressed sCop consonanCS in English. Lisker 

( lJ7_5J_has pointed puC „ChaC che acousCic manipu lations sug gesCing f 

'role for -Fj have involved covariaCion of ChVonseC frequency of F^ 
wich Che duraCion, and hence che frequency exCenC of Che F.i 
Cransicion; he has also argued ChaC effecCs hiCherCo ascribed Co 
Che cransicion are more properly aCCribuCed Co ics onset. Twi 
experimencs are reported in which -Fi onseC frequency and /Fi 
Cransicion duraC ion/exCenC were ' manipulaCed independenCly. . The 
resulcs confirm Lisker's suggescion ChaC Che major effecC of F^ in 
inicial voicing conCrasts is deCermined by iCs perceived frequency 
aC Che onseC of voicing and show ChaC a periodically exciCed F^ 
Cransicion is noC , £er se, a posicive cue Co voic-ing. In a Chivd 
experimenc ,. Che frequencies aC Che onseC of voicing of boCh Fi and 




*A paTC4*l summary of Chese resulCs was presenCed. aC Che 90ch meeCing of Che- 
AcbusCi^al' SocieCy of America, San Diego, California, November 1975. This 
paper has been accepCed for publicaCion in Che Jovial of Che AcousCical 
■\ SocieCy of America ■ \ . 

'/tThe Medical Research Counc il ■ Hear ing Re^search InsCiCiiCe, Noccingham, Eng- 
land . i - ■ . , 

Aknowledgmenc: -Experimenc I was conducCed in .Che DeparCmenC of Psychology 
at, Che Queen's UniversiCy of BelfasC, NorChern Ireland wiCh Che supporC of 
granC AT/2058/b21/HQ fram Che JoinC Spee.ch Research UniC, U.K. and gianC 
B/SG/1466 from 'che, Science Re search .-Counc il , U.iC. Ic was-. reporCed as 
*-FirsC formanc onseC frequency as a cue Co Che voicing disCincCion in pre- 
scr^^ssed, syllable-inicial scop-consonancs," in S£eech PercepCion No. 5,. 
pp 25-33. (Progress ReporC, DepartmenC of Psychology, The Queen's Univer- 
■ sity of Belfast). This paper, was written, and the later. experiments were 
carried out, at the Ha-skins Laboratories, New H en, Connecticut-, U.S.A. 
while Quentin Summerfield was supported by a.N.A.T.O. postdoctoral re- 
search fellowship. We should like to express our appreciation to Alvin 
• Liberman for his generous hospitality and encoursgement , and to Bruno Repp, 
.Peter Bailey, Gary Kuhn and_ David Pisoni for their criticisms of earlier 
drafts of this manuscript." ' •. 
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F2 were man i pul at ed,. The intluence on the perception of slop- 
cnnson.int voicing ihnt resul ted was determined spec i t ic a I ly by the 
frequency of Fj, rather than by the overall d ist r i'but ion of energy 
in the spectrum. The results demonstrate a complementary relation- 
ship between perceptual cue sensitivity and production constraints: 
in {Production, the VOT characterizing a particular stop-con,sonant 
varies inversely with the degree of vocal tract constriction, and 
hence the frequency of required by the phoneme following the 
stop; in perception, the lower the , frequency of F^ at the onset of 
voicing, the longer the VOT that is required to cue voice! essness . ^ 
In this way, the inclusion of Fj onset frequency in the cue- 
repertoire for voicing reduces the non invar i/inqe prob lem fc.r per- 
cept ion . . 

> • ^ — YNTkODOCTION 

• ... — 

Lisker and Abramson (196A) .suggt^'sted that the articulatory basis for the 
voiced-voiceless .'distinct ion for stop-consonants resides, in the relative 
timing of Irirypgeal and supralaryngeal articulations. Pre s't re s sed , syllable- 
initial voiced stops in English display temporal coincidence of oral release 
with the onset of laryngeal vibration. When the onset of vocal cord 
vibration follows oral release by more than about '^0 msec, the stop is 
voiceless. By translating variation ijn this articulatory dimension into 
variati^bn of the parametric input to an acoustic speech synthesizer, Lisker 
and. Abramson ( 1967) generated VOT^ continuci that spanned tHe two perceptur.l 
categories o f' vo icing ' for each of the three places of stop production used ;.n 
^ngli'sR'. Phoneme boundaries on these continua occurred close to those values 
of VOT that optimally; segregate voiced, from voiceless stops in the produc- 
tions of English speakers. Since then, VOT continua have been used exten- 
sively as experimental devices. They permit the determination of a phoneme 
boundary, changes in whose position, can be used as sensitive 'indices of the 
perceptual consequences of variation of parameters both intrinsl^c^ (for 
example,*^ Stevens and Klatt, 1974) and extrinsic (fr^ example, Eimasx and 
Corbit, i973; Summerfield, 1975a) to the test syllables themselves. ' HowevVr^,^^ 
it has not always been clear which aspects of the stimulus are held to be 
perceptual cues, given that many of xthe acoustical parameters so far asserted 
to possess cue value have tended to covary. Incorporating covariation in a 
set of stimuli is well justified from an articulatory point' of view if the 
objectives o^ an experiment are linguistic or cognitive. But, if the 
objectives are psychoacoust ical or perceptual, then the use of covarying 
parameters begs the question "of what adouFtical variables are registered and 



^With reference to the acoustics of production, the term 'VOT': will refer to 
the time interval between the onset of the occlusion release transient and 
***ie onset of quas iperiod ic i ty . With reference to continua of synthetic 
s t imu 1 i, Xhe term ' VOT ' will refer to theinterval ^between the onset of the 
stimulu/ (that may or may not include a burst) and the onset of periodic 
excitation. During this interval, the presence of noise excitation in F2 , 
J^and the higher fonnants, and the absence of energy in is implied. The 
term ' separat ion int erva 1 ' v' 1 refer only to the temporal aspect , of VOT. 



contribute to tin- purci-ption ot tlio contr.ist. A pn-cise spc^c i I u:.-U ion ol'tlit- 
p..rceptu«nUy portinont pnvcimetcfs is imporUinC if v.-ilid intorpretoc ions ,'irc 
to be made ot data obt.-iined using various types oi continue whose lUPinbors aro 
'.•aid to vary in "VOT". 

Using synthetic stimuli, Summerfield anu Hagga-d (:i974) artificially 
varied the temporal separat ion of the. fricar.ed' burs, from the events that 
normally' follow it: formant transitions and tho onset o f- periodic ity . Ihey 
demonstrated that the temporal interval is indeed c powerful perceptual cue, 
whether or not it is filled with aspiration. The question remains; which of 
the spectral parameters of VOT whose var iat ion is ' norma I ly correlated with 
that of the separation interval are also perceptual cues? Stevens and Klatt 
(1974)- suggested that some threshold duration or spectral extent of first 
formant (Fi ) transition may b- psychoacoustically a more basic cuo to the 
voiced value of the feature, .ind thot VOT (that is, the ten-poral 5^,-paration 
intervil) is grafted onto this through learning in infancy. Summerfield and 
Haggard (1974) showed that the detectability of transitions in both the first 
and higher formants, whether or not they were periodically excited, could 
provide important s'econdary cues for adults. Lisker ( 1975) has -.rgued that 
the simple articuLatory basis of VOT (for example, Lisker and Abramson, 1971) 
renders it the most general and basic cue, but proposed that if cny secondary 
aspect of the acoustical array related to formant transitions is important, 
then it. is the onset frequency of rather than its dynamic spectral 

properties. Lisker's data show that, when the iir.portance of the spectral cues 
is assessed'by trading them against VOT, which in turn affects the values of 
the secondary transition cues, then VOT does emerge as the most ' potent 
perceptual cue. However, his results, based on a nonor thogonal ly varying 
stimulus set, implicate the average frequency region of F2 as a functioning 
secondary cue in addition to F^ onset frequency. The experiment? reported 
herp were designed to refine and extend Lisker' s conclusion and to reduce the 
"a-biguity by using orthogonal ly. varying stimu-lus arrays. The matter can.be. 
simplified by asking three questions. Does F^ onset cue a voiced percept m 
inverse relation to its frequency? Is a rising F^ transition a positive cue 
to voicing independent of. its qnset frequency? Are spectral . in fluences on 
the perception of voicing a function only cL the frequency of F^ or of the 
distribut,ioh of energy in both F^ and the higher formants? Experiments I and 
II were designed to answer the first two of these questions. Experiment III 
was designed to .inswer the third question. 

EXPERIMENT I: Conditions 1 and 2. 

In the first, condition of Experiment I, the frequency of a fixed- 
frequency, transitionless Fi was syslemat ical ly lowered across a set of 
consonant-vowel (CV) VOT continua. If Lisker's ( 1975 ) .conclus ion as correct, 
this procedure should increase the pr^jbability of a voiced percept at .any 
given VuT. In the second condition, the onset frequency of Fj was held 
. constant independently of the realized ' VOT, while the duration, and conse- 
quently the spectral extent of F^ transition following voicing onset, were 
systematically increased. If a perio.dically excited F^ transition is, £er 
se, a cue to voicing, then this- procedure should increase the probability • of 
"voiced percept at an; given VOT. , , 
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Stimuli Procedure^ 



Both conditions pi Exporimtnit L woro run i ntrriic t ivn I y wi'ti) j^timuli 
genoratc?d at run-t iim^ by a Foncnxi/ OVK ITIb sorial resonance speoch synthesiz- 
er controUod by the SPl^X prop;r^n (Draper, 1973) runninji^ on a D.jil.C, PDP~12 
digital computed. Stiiiuili w«re exemplars drnwn [rom /g-\k/ VOT continim 
spanning;; th(? \0'T range fror. 0 /rosec to +80 msec in 1-msec stops. The closed- 
loop yalporithm controlling' stirfiulus presentation wi)s an implementation of 
PEST /(Taylor and Creelman/ 1967) with the following control pr.rameters: 
devi^ion limit ot" tl^ sequential test (W)~0.52; ^;tarting step 
s ize/ « 1 6, msec ; termiuntiyg step size = I msec. These parameters result in 
an ^estimate of the p 0./ point ow the psychometric functJon underlyi'ng The 
physical test con', inuum^/ this point corresponds to the phoneme boi/ndary. To 
-achieve a controlled ;^/timate of the position of the boundary, two PEST runs 
were randomly interleaved with starting points randomly drawn from preselect- 
ed ranges approximat/ely evenly balanced on either side of the suoj^ct's 
expected phoneme boi/ndary region. The two interleaved runs converged ind<c?-. 
pendentjy from starting points at long and short VOTs," and subjects were 
unaware of pertormf^ng in a closed-loop situation. Convergence was continued 
until the step sAz6 of each run had diminished to less 4:han or equal to 
1- msec and the VOTs corresponding to the p 0.5 estimates from each run were 
within 5 msec oy one another. The phoneme boundary position is here defined 
as the average c/t these two independent estimates. Previously, open-loop and 
closed-loop prtf(cedures for esc imat ing phoneme boundaries have been' compared 
and shown to pyroduce highly similar results ( Summer field , 1974a). 

The stimuli used in each condition were constructed from seven five- 
formant CV /'stimulus types*-. A stimulus type is' that set of synthesis 
control par/ameters that generates a stimulus with a.> VOT of 0 msec. The 
frequency ^ontours of F2 and . F3 did not differ between stimulv^s types and 
were const/ructed with initial* f.ormant transitions appropriate for the velar 
place of y^rt icula t ion . These transitions were linear in f r equency/ t ime over 
their duyation of 44 msec. The F2 transition had its onset at 2400 Hz and 
reached /a Gteady state at 2000 Hz. The F3 transition had its onset at 
.2600 Hz/and reached a steady state at 3000 Hz. F4 and F5 were set to 3500 Hz 
and 5000 Hz, respectively. The total duratioi?" of each stimulus typ-e was 
320 ms^c. The seven stimulus types used in Condition 1 were distinguished by 
the fyequpncies of their first f.or^ants that w^re set to 200, 225, 250, 275 , 
300, ,C50, and 400 Hz. The, seven ^timulus types used in .Condilion 2 were 
uist i/nguished by the duration of their F-^ transitions; these transitions 
always onset at 250 Hz and rose linearly at 5 Hz per msec for either 0, 6, 
12, / 18, 24, 30 or 36 msec after voicing' onset. No othe.' synthesis control 
par^me ters . were varied between- stimulus types or' conditions. Oyer the first 
80 /msec of each stimulus type,*^the overall amplitude contour was constant: and 
-th(^ fundamental frequency (F-^) was fixed at 100 Hz so that differences in Fq 
at/ voicing onset could no I' acconioany differences in VOT. A stimulus with any 
VOT in the range 0 msec to +80 msec could be constructed from any one of the 



With 
test 



W=0.5 the^ PEST 
i s obv la t od and 



algorithm is simplifiea. The Wald sequential decision 
a change in stimulus value occurs after"^ every response. 
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Figure 1: Schematic spectrograms showing the patterns of the first three 
formants for the ■ seven stimulus types . used in^ Experiment I, 
Condition 1 in exemplars with VOTs of 0 msec. { left ) "and" +20 msec" 
(right). Solid lines indicate periodic and. dotted lines aperiodic 
formant excitation. The stinulus types are distinguished by the 
frequencies of their t rans i L ion le ss first formants. 
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2: As Figure i for the seven stimulus types used in Experiment I, 
Condition 2. TY^e stimulus types are distinguished by the duration 
and extent of their first formant transitions which onset, inde- 
• .pendently of VOT, ^t 250 Hz. 



.mimuh.s typ.-s hy .■.Igor It Inn. Th.- ,.lK"iithm i vpl .ut.I tin- i-.TiWulu- ..xr.r.nf.nn 
prior Lo tho sp.c. t nnl VOX will, no, .-u- -xc . I ion .111 (lowrr .m,-! •.Imo 

widomvl th.' bamlwi.itl, ot K, from ()()ll/ to )l)()ll/, lor tlnM<po""^," 
nyllablo. Tlii« pr.u'o.lur.> r.Hlno'S tli,- lovol ol .'Mu'r i o.l i oiwry^y .m 1- i nnd 
tLroby «i>nul«t.-s tlu- .-.coast ic:" conH.-.i.-nc.-s ol .onplinR llu- pl,m;y.ix ' 
trachea. ' T\u' .on.sol of p i I cb-pnl .s i .U', w.'s synch ron i t th- sp.-c 1 1 i.nl VOl by 
the procoduro do«cv ib-ui by [)r..p.T .nd ll.-.R,.rcl ( 1 97/. ) . Fi^nr.- I ' J-J^ '-f';" 
the difti«n«nco8 bi-twec-n' the- iini.U.s typoM usrd ui Condition 1 in di«plnyn ol 
.the fomiaut: par.'>.n.«t.«r spec 1 1 irat ioiifi F,, K2 :ind F3 ol .«x..n,plnrP with VOls ol 
0 msec and +20 msec . ■ Kignr,- 2 displays analoRons pnttt-rns for th.« .ntimn . 
ufled in Condition 2 and shows that in ord.^r to hold th.- onsc-t t r .'cp.oncy o t I- , 
constant as VOT vari.:'d, it was nocossary to r.-strncturo th.« spoctral n.'lation 
between Fj and the higher forinants in a manner that is not representative ol 
any. naturally occurring variation. ^ 

Six adult subjeccs perlornied in the experiment, three in the order 
Condition 1 - Conditirti. 2. and three in the reverse order.. Each was a n.ntivo 
speaker of Br i t i sh Engl i sh and had served previously in experiments involving 
closed-loop phon.-me boundary cstimati'bn. f,_t imnl i were presented binaural ly 
thrcuRh AKG K60 600-Ohm headphones to s.-bjects who sat in a sound-damped 
cubicle Ihe peak intensity of presentation was constant across subjects at 
approximately- bS-dD SPL for stimuli with 0\nsec . VOT deriv.-d from the tv,o 
identical stimulus types (Types 3 and l in c\nditions I ^ renpec ively . 

Subjects, were instructed to identify the ini^tral consonant of each stimulus 
as either 7g/ or /k/ and to indicate their response by pressing one of two 
buttons labeTed -G- .and "K'. A thir,. v-nori, labeled 'V; could be pressed 
to summon a repetition of the current Jtimylus. Pach subject ran through he 
whole .et of continua twicer In Condition I, three .subject s experienced the 
continua in ascending, followed by descending, order of frequencies, and 
three in descending, followed by ascending order. The two estimates obtained 
were averaged to provide ,n sirtgle estimate for each suDject on each 
(ToStinuum. Analo.gous order balancing was employed in Condition 2. The lack 
of naturalness inherent in ' the stimulus structure posed rio difficulty for 
listeners, although some sub.jecCs reported hearing stimuli with long VOTs and 
extensive Fi transitions in Condition 2, as initiated by the cluster /kl/, 
rather than by the single consonant /k/ . ■ 



Results 



The seven boundary pos itions' obtained for each subject in each condition 
are plotted against the frequency of F, for Condition 1 in Figure 3, and 
against both the duration ol the F^ transition and the frequency of the Fj 
steady sta^e for Condition 2 in Figure 4. Mean boundary positions obtained 
by averaging these data over subjects are tabulated in Table 1 tor 
Condition 1 and in Table 2 for Condition 2. 

The results of Condition 1 sup|3ort Lisker's ( 1975 ) conclusion that the 
onset frequency ot F^ can function as a voicing cue: the data in Table 1 
show that -he position of the phonemes boundary averaged across subjects 
decreases: monotonical l y as the frequency of a transit^ionless first formant is 
raised OHy subject 6 failed to show an overaM decrement. The seven 
phoneme boundaries from each' of the six sub jec t s ''were examined together in a 
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3: Results of Expieriraent 1^ Condition 1 for six individual subjects. 
Each' point ^^Tots the mean 'of four phoneme bouTidary estimates 
(-derived from, two pairs:, of interleave"ds$Ij:.f os'ed-1 oop -estimates) i 
Points corresponding, to. each subject have beeti connected by 
straight lines, showing that for five of the subjects, the. voicing 
boundary shifted to shorter VOTs as the onset frequency of 
increased, ^ . 




Figure, 4: Results of Experiment 1, Condition 2 plotted as for Figure 3. The 
functions for individual subjects are either horizontal (S2, S5 ) 
or decline (SI, S3, S4 , 56.) as the duration of the Fj transition 
increased, showing that the presence of an F j^' t rans it ion does not 
predispose, voiced percepts when its onset frequency is fixed. 



TABLE 1: Experiment 1: Condition 1. 



Mean phoneme boundaries in msec of VOT /PBs/ averaged over two 
estimates by each of 6 subjects on seven /g.~k/,^,VOT continua differen- 
tiated by the frequency of a constant, freque^hcy , tr^ansit ionless first 
formant (200^Hz - 400 Hz). 

Number and first formant frequency (Hz) 



of Stimulus Type :- 










(1 ) - ' ( 2 ) • ( 3 ) 


(4) 


(5) 


. (6) 


(7) \ 


200 225 250 


275 


300 


330 


' 400 


33.81 30.99 29.53 


28. .13 


26 .64 


2U.7 3 


22.59 



TABLE 2: Experiment I:' Condition 2 



/ 



Mean phoneme boundaries in msec of VOT /PBs/ averaged over 
estimates by each of 6 subjects on seven /g-k/ VOT contTtiua difre^en 
tiated by the durations of their .^first -formant transitions (0 m^ec- 
36 msec) that onset at 250 Hz independently of VOT. 

Number and first formant transition duration ymsec) 
of Stimulus Type :- , 



wo 



I PBs J 



(1) 
0 

-28. 25 



(2) 
6 

28.01 



(3) 
12 

27.59 



.(4) 
18 

25.74 



(5) 
24 

26.04 



(6)/ 
30/ 
26.81 



(7) 
36 

26.18 




>Yonparainetric risC for monoronic trend (Ferguson, 1966) that gave a value of 
'the normal deviate equal to 6 . ig", ; indicat ing that the trend is significant 
(p°< 0.01; ?:-tailed). The results of Condition 2 indicate that variation in 
Fi transiti-on duration/extent does produce a small effect on the perception 
of stop voicing. However, it is not in the direction predicted from the 
argument's of Stevens and Klatt ( 1974) or Summerfield and Haggard (19-74). on 
the basis, of transition d^tectability . Tafcle 2 • shows that a. fall m ^he 
value'' of VOT at the phoneme boundary occurred as transition duration 
increased. This trend is evident in the data of Subjects 1, 3, 4 and 6 and 
is also significant (z=2.58; p < 0.05; 2-tailed). 

Discussion ' .. , ' 

The results Experiment I imply that the .critical aspect of for the 
perception of stop-voicing is its perceived frequency at the onset of 
voicing, and suggest ':hat an F^. ■ transit ion as . such does not specifically, 
predispose a voiced neicept . However, the rela.t ive amplitudes m the outputs 
of a serial resonance synthesizer are not fixed, but vary according to the 

■fo-mant frequency separations ;(c.f.. Fant, 1960). In natural productions 
constrictitig the supralaryngeal vocal tract lowers the frequency of F, and 
reduces the amplitudes of the higher formants and the overall intensity*' of 
the output. Increasing the frequency of Fi in an OVE synthesizer raises the 
overall intensity of the output, including the higher formants, so^ that the 

. distribution of' ener{,y'. in the spectrum increasingly favours, higher frequen 
cies-. Accordingly, the results of Experiment I could reflect, perceptual 
sensitivity either^to changes in the location of the. first spectral peak at 
the onset of periodicity, or alternat^ively , to changes in the amplitude of 
that peak relative to . peaks ^ at/ h • r : frequencies. To determine ••which 
interpretation is more appropriate control experiment was run using 

stimuli generated on a parallel formanc synthesizer whose, formant amplitudes 
could be specified individually and for which, therefore, the frequency of F^ 
and^^ the relative amplitudes of the first three . formant s could be -varied 
independently. 

. • , . EXPERIMENT II: " Control Conditions 1 and 2 [ _ 

■ In the first control "condition, nine VOT coritinua were created by 
combining each of . thre-^ ' values of F^ onset frequency with each of three 
extents of Fi transition. Within each continuum, the onset frequency of F^ 
was held constant as .in Condition 2 of Experiment I. ^If the results of that 
condition reflect perceptual sensitivity to changes in the onset frequency of 
F,, thefil*phoneme boundaries shduld vary here with Fj onset frequency , but not 
with Fi transition extent., In the second ' controU condition , the amplitude^of 
Fi relative to F2 was va^i?ied over a 12 dB range across three yOT contiriW^ 
while the spectral specification of the stimuli comprising .the continua was 
unchanged. If the results of Expeiiment. I reflect perceptual sens it ivity _ to 
changes in relative forihant amplitudes, then ..phoneme boundaries should shift ^ 
to shorter. VOTs as the intensity of r'l is, reduced .relative to. F2 . 
Alternatively, if the results reflect -Sensitivity to the frequencies of 
spectral peaks at the onset of periodicity, ra'ther than to their absolute or 
relat ive amplitudes , theh the three bound aries s^^^ __ ^ 

■■• • ' '., ' ' • '' ' ' ■ ' ' ■ ' 

■ :■ ' ' .... • • ■ "'" ■ '■ ■ 11 
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Control Condition 1 : St imul i and Procedure 

Nine two-formaht /g-k/ VOT continua were synthesized on the parallel 
resonance synthesizer at Raskins Laboratories (Matting.ly, 1968.).-, Each con- 
tinuum consisted of .eight 300 msec stimuli that vai^ied in VOT -from *;i;:5 msec 
to +50 msec in'., 5 msec steps with the onset of pitch pulsing synchronized to 
the intended VOT. As VOT increased along each continuum, the amplitude of .Fj 
was' reduced to zero and F2 was excite-i^by noise. Stimuli with the same VOT 
in different continua were differentiated only by the frequencies of their 
first formants. Within any continuum the actual onset frequency of was 
fixed and did not vary with VOT. Nine continua were 'created by combining 
three values of F| onset frequency (208, 311 and. 412 Hz) with three frequency 
extents of F| transition (200, 100 and 0 Hz). The duration of these 
transitions was 20 msec. (The first formant frequency parameter changed over 
five successive S-msec in,t erva 1 s , ■ r eac^jf^^ng a steady state in the fifth 
interval.) The transition rates were, th'erefore, 10 H^/msec, .5. Hz/msec and 
.0 HzMsec^. The transition rate of 5 Hz/msec is the same as. that used in 
Condicion 2 of Experiment 1.'. Tlie transition 'duration of 20 msec is longer 
than the .5 msec that Stevens and Klatt (1974) showed to be the 75 percent 
threshold duration for detection of an Fj transition changing at- a rate of 
8.5 Hz/msec. The acpustic diff^erences .among the members of thie continua are 
exemplified in Figure 5, where th^ formant. parameter' specifications of. 
stiiriuli in wh icli Fj onsets at. 208 Hz with VOTs "^o f 0 msec and ^ +20 msec are ^ 
d i spl ayed . • 

Two groups of Subjects listened to a randomization that included ten 
occurrences of each of .the 72 stimuli. Stimuli were presented binaurally 
through Grason-Stadler tDH39-300Z headphones at a level of 85''dBSPL (peak 
deflection). 'One group of subjects cons isted^^ of six members of the reser.r'ch 
-^^taff of-Haskins Laboratories, any ^6f whose residual phonetic naivety was 
idrspelled by a description of the acoustic st/ucture of. the stimuli.* The 
othfer group consisted of nine students attending a Yale University summer 
school who declared themselves to be phonetically naive. Subjects were 
instructed to make a forced ahoice identification of the initial consonant of 
each stimulus a s . e i ther Tg/ or /k/, but to indicate in additioti ifthe sound 
.that they heard was not a satisfactory exemplar of a CV syllable initiated by 
either /g/ or Ik/'. s> 

Control Cond it ion ' \^ Resul t s 

Four of the ^experienced subjects and six*o-f the naive subjects exhibited 
'predictable performance: ^they reported few instances of stimuli initiated by 
phonemes other than I gl *or /k/ and repotted increasing number's of /k/ 
percepts as VOT increased along each continuum. However, the VOT range 
+ 15 msec to +50 msec was not jiu f f ic ient ' t o permit • the computation of a 
phoneme boundary for every subject in every condition. Accordingly, the data 
from Condition 1 are sumr(iar ized* in Table 3. not as phoneme boundary l)Ositians, 
but as percentages of /g/ responses made by these ten subjects to the eight 
members of each continuum Combined. Figure 6 displays plot^s of the percen- 
tage of I gl responses made to each stimulus in e^ch continuum averaged across 
these suBjects. Each_point plots the/mean o f 100 pbservat ions . 
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Figure 5: . Schematic spectrograms showing the pattern of the. first t\vO 
formants for 'stimuli used in. Experiment ,11, Condition 1 m 
exemplars with VOTs of 0 msec (left) and.+20 msec (right) in which 

• • ■ i Fj- onsets .at 208 Hz.. Stimuli were derived from nine'.VOT continua 
di-stiriguished by a) t.he onset frequencies of their first formants. 
•' • (208,.. 311 or 41.2 Hz).,, and b) the extent of th'Sir first ' formant 
transitions (0, 100 or'200 Hz). ■ 
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Results of Experiment II , Condit ion" 1 pooled ov^t' 10 .subject s , 



TABLE 3 : . Experiment II : Condition I. 

Percentages of"'G' responses made to the members . o f .nine /g-rk/ VOT 
cont.inua averaged over 10 subjects". Each continuum consisted of, 
eight members ranging in VOT from_^15 mse c to - t-SO ms ec . The continua 
were distinguished by the onset freqTi-"ncy of rlierr first l:ormants 
(2Q8, 311 or 41Z Hz) and by their frequency extent of the first 
formant transitions (0 100, or 200 Hz). . 

. First Formant' Oii'set Frequency (Hz) .^ 
, 208 311 412 / 

First Formant 0 77.8 66.6 33.5 

• ' Transition 100 69.2 48.9, 34.1 

Extent (Hz) 200 ' 60.4 52.9 41.5 

A difference of 12.6 percent between any pair of means is sufficient 
for a f)Osteriori significance at the p < 0.01 level. • 



TABtE 4 : Experiment II : Condition 2, 

Mean phoneme boundaries in "msec ^oY^VOT^est imated ' by Prohit Analysis 
'for each of seven sub jects (S) op : :^ach ' o f ~^three_/ g-k/ VOT continua. 
The continua , were distinguished by the relative inte'nsTt-ies_o£ and 
¥2. .Fi varied through a 12 dB range across the three, con t,inua-C^... 
'6 dB, 0, dB and +6 dB relative Lo F2.) . « 



.Continuum: 1 ' 

^ . ^ Relative Intensity of F]^:- 

" (1) . (2) ^(3) 

Subject -6dB . OdB +6dB 

"s"i 36.39 34.56 33.76 

32' 39.73 36.95 33.77 

S3 . 39.65 39.56. 39.54 

S4. 33.96 .37.77 37. 42^ 

55 ■ 33.92 37.26 35,.^98 

56 ■ . 41.18 36,00 41V58 

57 ' . ^ = 27.88 32.22 36.05 ^ 

■ MEANS ' 3,6.10. ' 36.33 . 36.71 . 



Four subjects claimed that more than 25 percont/of the initial conso* 
nants were neither /g/ nor /k/ . They heard some stimuli with long VOTs as 
initiated by palatal affricates ( for , example , /tji/). Their data^ were 
■ qualiVat i^aiy~Rm^^^^ of the other subjects . 

The data of one experienc^& sub jec t were noise free but wiTl -be-mentioned no 
further 3r^ he only heard instances, of /g/ • . 

The numb*ers of /g/ re sponse s afforded each of the 72 stimuli by each of 
the ten consistent subjects were ex^mlified in a three-way univariate .analysis 
of variance" with the factors: 

■a)subjects(lO), ... 
.. 'b) ¥i onset frequency (208, 311 or 412 Hz), . 

c) transition extent (0, 100 or 200 Hz) and 

d) VOX (15, 20, 25, 30, 35. 40, 45 or 50 msep).. 

The effects of both the major independent variables and their interac- 
tion were significant (F^ onset frequency: F 1 2 , 18 j =28 . 64 p < 0.01. Fj 
transition extent: Fl 2 , 1 8J =1 1 . 38;. pv< 0.01. , Iij^teraction; Fl 4 , 36 j =7 . 30 ; 
p< 0.01). Post-hoc comparisons made according to the criteria recommended 
by Scheffe (1959) show that increasing F^ onset frequency both from 208 Hz to 
3 1 r Hz v';*^^^^^^ from 311 Hz - to 412 Hz , produced s ignificant decrease s in the 
percentage of /g/ responses (p< O.O^^-, Increasing- Fj trahsilrxon^^extent from 
0 Hz to either 100^ Hz or,' 200 Hz, also pro'duced a significa.nt decreas^.^^ in the 
percentage of voiced percepts (p< 0.05), but no systematic effect resulted 
from the^increase from IwO Hz to 200 Hz of Fj t rans it ?on extent . ^The extent 
to Wich these results are manifest in individual com.parisons may be examined 
in- .Table 3 where a difference of '12.6 percent between any pa ir o f .means is 
required for a p os t er for 1 s igni f ic ance at' the p < 0.01 level. 

Overall, the results show that increasing F^ onset frequency reduces -the 
proportion of voiced percepts indepeildently of the characteristics Of any 
following F| transition. The extent to which the presence/" of an 
transition also reduces the -proportion of voiced per^.ppts depends Oti its 
onset frequency. The effect is largest for onsets at 208 Hz, and diminishes 
to zero as the onset "Is iraised to 412 Hz. . 

Control Condit ion 2 . , . 

. Twa stimuli were added to the continuum used in Condition 1 in which Fj 

had its onset at 311 Hz with 0 Hz transition ex^^ent . The extended 

continuum ranged . from +10.msec .to, +55 msec of VOT. It was duplicat.^d twice 
to create a total of three contintia in' which" the level of F^ relative to F2 
was'+6 dB^'j'O dB- :and -6 dB. Seven ,ri*aive subjects listened to a. randomization 
comprising t>.1 instances of each of the 30 stimuli, and indicated whet h*^er-' they 
perceived the initial consonant as /g/.^pr /k/ . ' Table 4 shows their phbneme 
bpundar ies e St imatxd by probit ' analys is (Finney, 1971). 

These boundaries w^re examineu : n ■ a^, two-way analy.sis of variance with 
the factors : 



•a) subjects (7) and 

•b)- relative formant amplitude (+6 dB, 0 (f.B or -6 dB) 



The effect of varying the relative amplitudes of Fj ^nd F2 w^s not. 
significant (Fl2', lzj=0. 093) . Although one subject did show a .-.nail increase 
"Tri ~bou'nliary"~po^^^ ' others displayed 

the reverse pattern. Overall, variation oY "the" r~e lat rye~^i-irt^^^^ 
and F2 in these cqntiriua prv-^duced no systematic effect on the decision as to 
whether the initial stop was voiced or voiceless. 

D iscus sion y' 

/ ' . ■ ■ 

The perceptual effects of varying onset frequency' in Experiment I 

could have been mediated by those . covariat ions- ' in reiat ive and overall 
formant amplitudes that the acoustic theory of speech,/ product ion predicts, 
and that-an OVE synthesizer produces. Had that beerf so, no effects should 
have resulted in Experiment II from "varying- the freqtiency of Fj while holding 
its absolute" and relative ampl i r/ude constant , b'ut an appreciable effect 
should hfve resulted from varying its amplitude while holding^its frequency 
constant; This was not the case. The opposite pattern was produced and 
confirms that the critical aspect of F.^ ^for the perceptual categorization of 
members of VOT continua is its perce ived' 'Irequeticr a^^^ o£ voicing, 

rather than its absolute or relative amplitude. • • 

In, ContrQl Condition 1, the frequency ektent' of 'txaTisit ion was varied 
while holding' its onset frequeffcv fixed. . The results of this manipulation 
confirmed the^ second finding of Experiment I that a rising transition 
following the onset of voicing does not, in itself, increase the probability^ 
of a voiced percept, llrans it ions on^etting at 250Hz ( in Experiment I) ^nd at 
208Hz and-'^311Hz (in Experiment II),. significantly increased the probability 
of voiceless percepts. The physiological representations of tHg separation 
cue and the Fi' onset cue could both be dn f 1 uenced ,;by whethei voicing onset is 
accompanied by a rising, rather than 5 steady, F^.- If there were less energy 
in the critical band around the putative onset frequency of. an F^ tran^^ttion 
•than at the onset of a fixed frequency Fi, then the . separ^t ion interval might 
be perceived as longer, and the Y^. onset . frequency as higher than their 
respective physical values. The data imply that the perceived onset of Fi in 

• thes" stimuli is determined by spectrotemporal integration over -the duration 
of the first twp.or three pitch pulses, but that the dependency of F^ onset 

..registration oii ' spectrotemporal integration decreases as physical onset 
frequencies increase from 2"0O Hz to 400 Hz. ' \ ■. ■ ■ 

^ Experiments I' and II demonstrate that ■ the. peree ived frequency of Fj). at 
the onset of .voicing, plays an identifiable role as a spectral parameter 
■influ^'Yicing th^S voiced-voiceless decision . They ^^3o not determine whether, it 
is correct' to'imput.e to the ' frequency o f the" f) peak the entire -burden of 

■ -epectriil influence or whether that influence deriytes from the distribution of 
energy -in the spectrum including bothFi and the higher formants. Lisker 

'.( 1975.0 considered this possibility to be uplikel>r, although ■ the perceived 
differences between his stimulus types can be economically summarized by 
expressing the spectral influence as Tthe weighted sum of an" effect of F^ anfl 
an effect of F,2.. A dependency of boundary location, on the fr.equencies of 



TABLES 5a, by c: Obtained phoneme boundaries :in,Jisec of. VOT 
boundaries predicted by the equation:-' _ ' 

\ Vb=58 - 100l(2/5)Log(Fl*/200) + ( 2/3 )Log( ^2*/ looO > i 
where : > ' . 

Vb is the predicted boundary in msec ^of VOX, 
Fj* and F2* are the frequencies of the first 
and second forraants at the onset of voicing. . 



TABLE 5a,: The letters 'A, B, C, D and E icientify five /g-k/ continua 
as in the— prig inal paper. . 



Cont inuum 


Fi*.. 


F2* 


Obt ained 


PreCi ic ted 


Di f ference 


A 


540 


. 1232 


39 . 


40 




B 


769 


■1232 


30 


29 


.■ -1 1 y 


C 


386 


. 1232 


43 


41 


-2 


D 


' 286" 


1845 


35 


34 


-1 .. 


jf. 


412 


2000 


24 


25 


+1 



Data froni Lisker ( 1975) . 



TABLE 5b:. Predictions are made for three of-^the seven cone 



Continuum F^* F2^' Obtained Predicted Difference 

1 200 2098 34 37 . +3 - 

5 . 300 215? 27 . 29 ' +2 

7 - .400 2194 23 23. . 0 

Data from Experiment I: Condition 1. . ^ ^ , 



TABLE 5c: Predictions are made for eight /da-ta/ continua difffcie'n- 
tiated by their transition durations (FjT-Dur . ) . 



Cont iriuum 




^2" 


Obtained 


Predicted 


Dif fe 


Fi T-Dur. 












20 ' ■ 


645 


1200 


21 


:32 , 


+ 11 


25 


575 


1235 


22 


34 


+12 


40 


540 


1320 


30 


33 . 


+3 


5'5 . 


478 


1375 


34 


34 


0 


70- 


452 


1410 


40 


34 


-6 


85 / . 


427 


1445 


44 


34 ,- 


-10 


100 • 


400 


1480 


45 '. 


34 


-U 


115 


■ 375 


1500 


46 


35 


-11 



Data frOTn. Li.sk er et al. ( 1975) 
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Expt.2, ■ 
Condition 1 



1msec Of W^^^ 

{) 10 20 ' 30 iO 50 60 

Positiofi of Phoneme Boundary 



Fig'tire^: Plot" oi 



■1 



the data 



:he position 'of tlie voicing boundary ag&inst tlie onset frequency of Fj fol-; 
seU:" indicated. The dotted line falling. dragonally. from left to right 



segregafcJs the data according to the frequency of F2 at, the voicing tonndarf. 



1 •■' '. 



1 

60 



both and. F2 at the onset of voicing is economically expressed in the 
otherwise arbitrary formula: ' * 



Vb 



-Wlje/e: Vb is the predicted voicing boundary of 

• . VOT in msec . - ^ " • 

.: Fj*^' and F2''^* are the frequencies in Hz of the first .and 
second forrnants at the onset of. voicing. 

The values of the constants were derived by triaL and error 1 to fit Lisker's 
( 1975) data as shown in Table 5a. While, the . fit to Lisker'ls data is quite 
good, suggesting a role for F2 , and the expression adequately predicts the 
boundary positions observed here in Experiment I (shown', in Table 5b), 
T^bl^ 5c shows, that the equation fails to account for' the data of Lisker, 
Libermari, Erick^n and Dechovitz (1975). 

Figure 7 displays a pl6t of obtained phoneme boundary location as a 
function, of F| onset frequency, for data reported in the present paper, and by 
Stevens and Klatt ( 1974), Lisker ( 1975), Lisker'et al. ( 1975), arid Darwin 
and Brady ( 1975) .' There are two 'important features of. ..this d isplay / '\Firs t , 
the; inverse relationship between the onset frequency, oif F]^ and the posi^aon 
of *tKe voicing boundary demonstrated in the present experiments is eqijklly 
apparent in the other sets- of data plotted here. * Second; despite the failure 
■of the equation to' describe the data of Lisker et al . , the remaining data do 
justify the cearch for some description of spectral influences that includes 
the frequency of F2 in addition to that of Fj. The dotted line in FiguVe 7 
falling diagonally from left to right - segregate s • the data according to '.the 
freq'uency of F2 incorporated* in the .stimuli.. Resu.lts obtained from stimuli 
in which F2 was above 1500 Hz fall below this line, those in, which F2,^as 
below 1500 Hz fall above 'the line, The pattern suggests that lowerin^'Che 
frequencies of both F2,and F2 cah cause the voic^tig boundary to shift'* t,o 
longer VOTs. Ift' addition, it appears that the mo.^e diffuse the spectrum 1:he 
larger Ls the effect of varying onset frequency. 

While this is one explanation for the pattern of data in; Figure ^7?r-> i-t is 
Slso possible that the pattern reflects the* e f feet s o f variations in voicing 
cues quite different from those, considered * here [see Klatt ( 1975) for a 
review], and the' effects of different strategies, for 'synthesis and the use of 
different .groups of -^listeners . Resolution of these alternatives' requires, 
that , the same group' of fisteners categorize the members of a set of VpT" 
continua whose vocalic contexts are^ characterized by a range of F^ frequen- 
cies in combination with a range of Fj frequencies.' This was done in 
Experiment III. ^ 

. ■ ' EXPERIMENT III . - 

St imui i and Procedure • ' . . ■ . - 

Sixteen /d-t/ VOT continua" were synthesized on the parallel foirmant 
synthesizer at Haskins Laboratories. The cont inua .^pc luded identical syn- 
thesis control .parameter spec i f icat ions^ for F3, '^q and the overall, and 
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Figure 8': Scliejiialic .spectrogEatns showing the patterns ,o'f the ' first three for. ints for the 
--J/stimuri , used in Experiment , III in 'exemplars with VOTs of ,0,msec tlelt) and +20 msec 
'^■'"''"•i(right)r 'Sixteen VOT continue were created by combining each of four F] contours 
' ([.])-t4]) with each.of four'F2. contours (il]-[4]). The stimuli included a 10 msec- 
■ • ' , burst centered on 4000 Hz that is not shown.; . 



individual lorroant: amplitude's. They were . dist inguished only b y^^i-t e n c e s 
in' the frequency contours of their first ^ind second formants. ST)tt-a^ 
continua were formed by combining each, of four Fj steady-state frequencies 
(208, 412, 614 and 8,19 Hx) with each of four F2 steady-state* frequencies 
'(lOOl-, 1306, 1611 and" 1917 Hz). This range of formant frequencies includes 
vowels not found in the English vowel system. Transitions in Fj , F2 and F3 
were linear in frequency/time oyer their duration of 35, msec . F^ transitions 
rose from 208 Hz at stimul-us onset to the appropriate steady-state. F2 onset 
frequencies were computed so that the extrapolated trajectories of F2 
transitions originated at 1800 Hz 50 msec before syllable onset. The F3- 
transition had its onset at 2861 Hz and-. fell to a steady-state at 2527 Hz. 
All stimuli included a fricated burst centered on 4000 Hz and lasting 10 msec 
from stimulus onset. ,.Each stimulus was 300 msec in, duration. Over the first 
100 msec, the fundamental frequency was constant at 110 Hz. Figure B Ln- 
clurfes schematic displays of the formant parameter specifications of the 
s t imu i i . 

Each continuum cons isted. o f 10 members with VOTs o£ +5, +10 +15, +20, 
'+25, +30, +35, +40,^ +45 and >+50 msec formed by replacing periodic excitation 
with noise excitation in F2 and F3 and eliminating energy, in F^. The onset 
of nitch-puls ing was synchronized to the intended VOT in every stimulus. 

Ten naive subjects listened to a randomization that included 10 in- 
stances of each of the 160 stimuli over Grason-^Stadler TDH39-300Z headphones 
at a constant peak intensity of 85 dB SPL . They were instructed to make a 
forced-choice ident i f ic at ion o f the initial consonant of each stimulus as 
either /d/ or /t/ and. to indicate their percept by writing 'D' or 'T*. In 
addition, . Subjects were' instructed to -mark with a '?' '^any response about 
which they were not;r'^c7)rif ident . . . 

Results ^ ^ ' \ . . 

Despite being presented with a bizarre array o f 'vowel s , mos t subjects 
experienced little difficulty in performing the task. While four subjects 
did indicate that many of their responses to the members of the four continua 
with F^ set to 200 Hz were gue sses , . n.6 subject performed inconsistently with 
stimuli drawn from the other continua. 

The data were examined in three ways in di f f erent un ivar iate ''^na 1 yse s- o f 
variance. The first examined the sums of the numbers of 'D* responses made 
to each stimulus by each subject, according to the four factors: 

a) Subjects (10) , , - 

b) F^ steady-state (208, 412, 614 of 819 Hz), 

c; F2 steady-state ( 1001, 1306, 1611 or 1917 Hz) and ' 
d) VOT ( + 5, +10, + 15, +2.0, +25, +30, +35, +40, +45 or +50 msec). 

Both the main effect of F^ ( F I 3 , 2 7 J =26 . 27 ; p< 0.01) and. its int eragt ion- 
with VOT ^F[27,243j=13.27; p < 0.01) were significant. Neither the main 
effect of F2 (Fl3,27j=0.68j p> 0.20, nor its. interaction with VOT were 
significant. The data provided by the« six subjects who perfor. d consistent- 
ly on all sixteen continua were examined in probit analyses that fitted 
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"ogivFrTb the data from each subjeat for each continuum. Two parameters were 
estimated for each ogive:, the physical stimulus value corresponding to the 
p 0.5 point on the psychbmetr ic i-unct ion , and the slope of the probit 
regression. The first, parameter is .an estimate of thfe phoneme boundary. The 
second "varies directly ..with the standard deviation of the psychometric 
function underlying the test continuum and hence reflects the slope of the 
identification function at the boundary. The two , parameters were examined in 
-separate analyses ^with the factors: 



~a)-suh.ject,s (6) ; 
b ) F i/s^teadN^-s tate 
c) F.2 steady-state 



(208, 412, 616 or 818 Hz) and 
( 1001 , 1306, 161 1 or 1917 Hz) . 



Analysis\oC the 50 percent intercepts that correspond to the 
boundary, showed a significant effect of. Fi ( F [ 3 , 1 5 J =3 5 . 9 5 ; p< 
npnsignificant ^effects of F2 ( F [ 3 , 1 5 1 =0. 84 ; p > 0.2), and no F1XF2 
tion (Fl9,45J=0.48; \p >0.2)^, Analysis of the boundary slopes also showed a 
significant effect of 
F2 (F1 3, 151=0.05; .p 



phoneme 
0.001) , 
interac- 



Fi (F[3,15.;j:75.00; p < . 0 . 025 ) , i>ons igni firant effects of 
0.2), and no F1XF2 "interaction ( F [ 9 , 4 5, J =1 . 93 ; p> O.l). 



These results may be assessed in relation td the plots in Figure 9 where' 
boundary position. is\ plotted against the steady-state frequency of F2 for 
each value, of Fj steady-state frequency. Only data provided by the ' six 
subjects who perforinecl consistently on all s ixteen cont inua are represented , 
The plots corresp^ondirtg to each value of F^ onset frequency are horizontal, 
illustrating the lack\ of any depe:hdency . of boundary position on F2"onset 
frequency. Me ans ' ob t a ined by averaging over. these subjects are tabulated m 
Table 6 which shows thkt as the F^ steady-state increases in frequency, two. 
things do happen:, ohoneme boundaries shiff-to" shorter VOTs and the- slopes of 
the probit regressions^ and hence of the identification functions at the 
boundary, become steeper,.^ 



Discuss ion 



\ 



overall, the perceived frequency of F2 at the onset of 
hif-icant role in determining how 1 isteners , categorize 
the members of /d-t/ vb^T d'ont inua as voiced or voiceless^. It is urtlik^ly 
that- the absence of,,a^ Fo effect here, as contrasted with Lisker's ( 1975 ) 

''use\ of the alveolar rather than' the velar place of 
of \the data from Experiment III with that plotted 4ri" 
ar Jnd" alveolar data to correspond quite precisely. 



It; is" clear that , 
voicing plays an ins-i^ 



7 shows our vel 



data, results from our 
production. Comparison 
F igur e 



While Lisker's ( 1975 ) <\ata \remain^ anomalous , the present result is congruent 
with two earlier . obser\lat ions. _Summerfield ( 1 974a ) var ied ' the durations of 



syllable-initial F^, F2 
ki/ VOT :cont inua . This 



and F3 trans it-ions * in the members of /ga-ka/ and /gi- 
produced a systematic change in the position of the 



I 



^However , -see Draper anc| Haggard ( 1 974 ) , Sawusch and Pi soni ( 1974 ) , and^Repp 
TT;976) for discussions!* of e-ffects on the perception of place and voicing 



deriving from the microl,struc!:lure of F2 and F3 t rans.it ions , as opposed to the 
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Figure 9: Results, of Experiment III for six individual subjects who performed consistently on 
all sixteen continua.' For each subject four empirical functions relate the position 
of the voicing boundary (estimated by, probits) to the frequency of the F2 steady 
' .state for each of four values of Fj steady state. The functions are essentially 
, horizontal showing no dependency of the position of the voicing boundary on the 
spectral characteristics of F2. 



TABLE 6: . Experiment III. .. 

Phoneme boundary positions in msec of VOT averaged over six subjects 
whose data were internal ly cons istent on all sixteen continua. 

. .The cont inua were distinguished by the frequency of their F2 steady- 
states (1001 Hz., 130.6 Hz, 1611 Hz or 1917 Hz) and the frequency of 
their f'i steady-states (20^' Hz, 412 Hz, 614 Hz or 819 Hz). 

.Four values are .... ind ic at ed for each continuum. The first is the 
position of the average phoneme boundary in msec of VOT IPBJ. The 
^'e^ronci is the average slope of the Probit regression line [SLj. Its 
units'are (Probit^of 1+voicedJ responses) Ams .) . = .. . 

Th^'e third and fourth values are the frequencies pf the 'first and 

■second formants at the mean phoneme boundary locations (F^^f and F2^'^') . 

:/ ' . ■ ■ • 
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pl^oneme boundnry in /a/-context, where there was an extreme transition, 
whose. onset frequency at any given VOT varied with transition duration. 
However, there was no effect in /i/-context, where. , desp ite a rogligible F^ 
transition,' there were .appreciable transitions in F2 and Fg whose onset 
frequenc ies . did vary . Lisker eC al. ( 1975 ) varied the durations of the F2 
and F3 transitions independently of^ that in Fj in the members of a /da~ta/ 
continuum. Systematic changes in the position of the voicing boundary 
resulted from manipulations of F2 , but not from those of F2 and F3 . The* 
results Experiment HI augment: these earlier findings. They demonstrate 
that the-^raajor spectral influence on the perception of. stop-voic ing resides 
in Fj and is not distributed throughout the entire spectrum. Perceptual 
beh^avior is explained ih terms of the direct acoustic effects of particular 
vocalic environments on the voicing cues without the invocation of feedback 
from t he" i!>honet ic id ent i f icat ion o f the vowel. * 

^or each stieady-state frequency of F2 used'' in .Experiment III, the 
emp'rical funct ion relat ing the position of the phoneme boundary to the'^onset 
frequency of F|, if plotted in Figure 7, would cross the dotted line that has 
been purported Co segregate results according to the frequency of F2 
incorporated in the stimuli. Clearly, a different rationale for the pattern 
of data in Figure 7 is required. The explanation may be found in the 
observation that the different data sets displayed derived from stimuli with' 
different overall durations. The stimuli of Lisker et al . (1975) and Darwin 
and Brady (1975) had durations of 600 ms^c , " wh il e those of Lisker (1975) were 
450 msec, and' those used in the 'present e.iperiments were 300 msec in 
duration. Summerfielci and Haggard ( 1972) observed that increasing the. 
duration of the steady-state portion of a CV.rsyllable with a fixed VOT 
increased the . probab il ity .that the initial consonant would be,, perceived as 
voiced. They argued that this finding demonstrated perceptual sensitivity to 
acoustic covariants of speech rate. We have .^replicated this finding ^and 
found that an increase' from 90 msec to ' 310 msec in the duration of the vowel 
in the members^ of a /biz-pi^./ continuum shifts the position' of the voicing 
boundary by about 7 msec. A simple mechanism th-^at could simulate this effect 
would scale the duration of the separation interval in a stimulus in relation 
to the total duration of the syllable, combine the scaled duration with 
measures of other pertinent cues, and compare the combined cUe-value with a 
criterion value to determine the value of the voicing feature. -If the effect 
of manipulating the physical value 3f. another cue, for example, F^ onset 
frequency, were assessed by measuring changes in the position of the voicing 
boundary expressed in terms of the physical value of the separation interval, 
then the measured effect, wouTd increase as the total duration of the . stimulus 
increased. The relation between the present data and that of Lisker et al .. 
( 1975) and Darwin and Brad>J *;i975) is congruent with this rational^; larger 
effects of onset frequency variation were produced by these authors' 

600 msec stimuli than by our 300 msec stimuli. Th is explanat ion remains to 
be tested and does not account for the patterns pf Stevens and Klatt's (1974) 
and Lisker's (1975) data; those data remain anomalous. 
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GENERAL DISCUSSION 
Trading Relationships in Produc tion and Percept ion 

These results idontit^y the^ perceived frequency of the first formant at 
the onset of voicing as the critical spectral parameter influencing the 
perceptual categorization o/'members pf VOT continua.. They have shown that a 
larger value of the , separat ion interval, the purely temporal component of 
VOT, is required for the perception of a voiceless stop when has a low 
onset frequency (indicating greater vocal tract constriction), and vice versa . 
This trading realtionship corresponds elegantly with one in prcxduction. 

In production, oral release gestures of differing extents made by the 
same articulators nevertheless tend to require the same length of time .(for 
example, Kent and :MoU , 196^; PerMll, 1969). It is observed that VOT varies 
inversely .with both the rate at which the oral release gesture is made ?nd 
with the degree of vocal, trdct constriction required by th^ phoneme following 
the stop. Thus, longer VOTs characterize volar stops., compared to alveolars, 
compared to bi.labials (Lisker aid Abramson, 1 964 ); VOTs ^ tend to be lon,^,er 
before the vowel /i/ than before /a/ (Klact', 1975; Summerfield, 1975a); VOTs 
are longer in stop+/r /+vowel and ^top+/l/+vowel environments than in 
.,stop+vowel environments (Klatt, 1975).^ It is not entirely clear why this 
relationship occurs in production. A relat iyelyXconstr icted vocal tract both 
increases the acoustic load on the. glottal source (Flanagan and Landgraf, 
1968),, and may also,, retard the attainment of the t ransglot t^aj pressure drop 
necessary for vocal cord vibration (van den Berg, r968) . Klatt (1975) points 
out, however, that passive aerodynamics can only ;contr ibute to* variations in 
VOT observed in productions of voiced stops, since in voiceless productions 
the supraglottal pressure established during the occlusive phase is entirely 
dissipated during the ' fr ic at ive . port ion of the. stop-.rel.ease and is at 



^L, Lisker (1961) Voicing- lag in clusters of stl)p plus /r/. Haskins 
Laboratories Final P eport on Speech Research and Instrutnentat ion . (unpub- 

li shed) . Lisker reports VOTs measured^ in syllable-initial voiceless stops 

preceding a vowel and preceding /r/+vowel as 'follows; 

/p/: +61 msec, /pr/: +89 msec; . ' 

/t/: +64 msec-^, /tr/: +110 msec; ' 

" /k/: +77 msec, /kr/.:, +107 msec. \ ^ " x * 



Klatt ( 1975) reports similar data for voiceless , plos,>^ves aid the 
following data for voiced plosives: . 

/b/: +7msec, /br/: +l^Tns.ecl, 
■'V /d/: +14 msec, /dr/: +29 msec; , 

7g/ : +2 3 msec, /gr/:i+32 msec. 

; ■ /It. Ls^. noteworthy that a put at ively- voiced ,svllable-init ial /gr/ can be 
•Qharacter ized by a VOT almost twice as large as* "the simultaneity, threshold 
(Hirsh, ^ 1959) that , has been invoiced as a psychoacoust ic basis for the 
voicing' distinctilon in English ( for ' exam.ple , Miller'ec al,,, 1976; Pisoni, in 
press ) . . ' , 



3o 



17 



\ 



atmospheric level at the time when vocal cord adduction is initiated. He 
suggests that, to offset the inherontl/ low frequency, of Fj when stops are 
produced b*^ I'oro a close vowel or a \ateral, the timing of glottal adduction 
relative to oral^ release could be actively delayed.5 xt is fairly parsimoni- 
ous to postulate such learned componsatibn in production. Perceptual sensi- 
tivity to the summed cue Values of separation interval and onset frequency 
is already required, whatever the habits of production may be. By pooling 
measures of these two cues at a low level, the noninvar iance problem for 
perception is reduced. This perceptual summation should apply equally in the 
speaker's perception of his own productions. As a quid pro gup , production 
could be expected to develop vowel contingent mod i f icat ions . to delay adduc- 
tion in order to permit a general criterion value of the summed measure to, 
• characterize phoneme boundari;es in most circumstances. Possibly, small 
pass ive aerodynamic effects of the adjacent vowel upon voicing onset cccur in 
unstressed syl lables ,, while larger >.-'elays result from controlled adduction 
delay in stressed syllables. 

The^ identification of the role of F] onset frequency, p'<3rmits the 
rationalisation of a group of previously report;ed results. In Figure 10, 
four Fj transition contours thar. might be incorporated in the members of 
synihetic VOT continua are scheji^r ized . Trafisition^ [aj and IbJ differ in 
duration, while transitions [bj, and [cj differ in spectral extent. Contcrur 
IdJ evinces no transition. Were voicing to onset at time T^ msec, F] onset- 
frequencies of Fg, Fb, Fc and Fj' Hz would tesult. The diagram exemplifies, 
as Lisker et al. (1975) have emphasized, that variation in' either the 
temporal dnration or the frequency extent of an t rans it ion resu 1 1 s^ in' 

covariation of Fj onset frequency at any given VOT. Thus, effects previously 
attributed to Fj transit ions following experimental manipulation of either 
transition duration (Stevens and Klatt, 1974; Summepfield, 19/^=1)5 or fre- 
quency extent (Summer field- and Haggard , 1974) , where the Fj steady state was 
fixed, are more appropriately ascribed to variation' in Fj onset frequency. 
Similarly, phoneme boundaries on VOT continua involving the yowi^l /i^ (with a 
low frequency Fj in the vowel and hence little or no Fj transition) fall at 
longer VOTs than do those on continua with the vowel /a/ '(with a high 
frequency F| in the vowel and- a potentially extensive F] transition) (Cooper, 
1974; Summerfield, 1974a; 1975b); that finding is rat ional ly explained by the 
necessarily lower Fj onset frequency in /i/-context. (Compare contours [bJ 
and [dj in Figure 10.) These results would be paradoxical if the transition 
were considered 10 be a cue to voicedness; the' parado^c led Summerfield and 




^We 'and -our colleague Peter Bailey have recently measured \ periods of 
devoicing and VOTs. in productions of /p/, /t/ and /k/ before /i/, /a/, /ri/ 
and /ra/ in b isyl lab les such as /bepri/.. Total^periods of devoicing (that 
is, ^the time- from the d isappearence of periodicity in the waveform .t 
approximately the moment of. stop ^closure to its '\eemergehce ^t voicing 
onset), tend to be, more invariant than either theXperiod b/f devoicing 
preceding oral release or the VOT itself. Possibly obserN^ed covariations of 
VOT with the degree of vocal tract constriction requireds by the following 
phoneme reflect an active prpcess' in which it is the moment\pf oral release 
.that is varied within a fixed t ime- f rame*" o f adduct iori-abduct i\n . 




.Figure 10: Schematic .descriptions of four syl lable-init ial first formant 
-ontours (iaj». IbJ, IcJ, [dj) whicK could be incorporated in the 
member-s of different VOT continua. Were .voicing , to onset at time 
T]^, fbrst 'formant onset frequencies cf F ^ , , and F^ Hz would 
resul t .... 



Haggard (1974) to consider a possibility that thoy othorwiso acknowl t?d^^od to 
be unparsimonious » namely that the porcepttial weight inga of measures of the 
temporal and spectral aspects pf VOT might be conditioned by vocalic context. 
With Fi/ onset frequency identified as the critical spectral parameter , there 
is nd.n4ed for s\ic\\ feedback, and the voicing decision may be reached without 
reference to the category /of phoneme following the stop. (See also Darwin 
and Brady., 1975 .) Further methodological implications are reviewed in a 
following section. . 

The results '^btrained hero may reflect the effects of. another, less 
influential spectval parameter, in addition to F| onset" freq^Uc-ncy. The 
schematic displays in Figures 1, 2, 5 and 8 show that the constraints that 
were applied to ,the acoustic structure of the stimuli necessarily resulted in 
covariation of the frequency of .the '"Fi steady-state with, in different 
conditions., either the onset frequency of Fj, or the extent of .the F^ 
transition. Increases in'both these latter variables raised the "probabil ity 
that' a stop-consonant characterized by a particular VOT would be perceived as 
voiceless. Thus, the results exhibit a correlat ion between the frequency of 
' the F^ steady-.state and the probability of a voiceless percept* Experiment I 
.'showed that thpre is not a strong causative relationship between the two. 
However, the results do not eliminate the possibility that there may be some 
'influence. Stevens, and House (1963) noted that the contour of F^ in* the 
vocqlic portionr3 of natural CV syllables is lower in frequency following, 
voiced, as opposed to voixeless, stops — reflecting the increase in vocal 
tract, length that results from the lower position of the larynx in voiced 
productions, (tor example, Ewan and Krones, 1974). This alspect of articula- 
tory behavior increases the spectral difference in F^ at voicing onset 
between voiced and voiceless productions. It. remains to be determined 
whether an additional perceptual effect' derives from the coart iculated 
variation in the F^ steady-state. 

First Formant Transit ions and First Formant Onsets 

The failure of an'F^ transition to cue voicing in adults raises doubts 
about Stevens and Klatt's (1^74) suggestion as to its perceptual primacy for 
the perception of voicing contrasts in infants. Such wariness is reinforced 
by two recent findings. First, demonstrations of the categorical perc#tion 
of the members of continua formed by varying* the relative onset tim^s of 
noise and %uzz segments (Miller, Pastore, Wier, Kelley and Dooling, 197Q) and 
pairs of sine waves (Pisoni, in press) have confirmed Hirsh's" claim (Hirsh,. 
1959; Hirsh and Sherrick, 1961) that a natural psychoacoust ic boundary 
between the perception of successive and simultaneous coterminous "^acoustic 
events occurs at a temporal pffset of about 17 msec. Although as the results 
of the present experiment^ show, the perception of voicing contrasts involves 
the regist-fat ion ,of the spectral concommitants of Che interval between 
release- a,nd voicing onset, psychoacoust ic cons iderat ions may well dictate why 
a tempqial interval is .the basis of ' the voicing distincti'on in general 
(whethe'r positive or negative, values of VOT are involved), and why in 
partioblar many of the world's languages place a category boundary between 
VOTs of 0 arid +40 msec . The second difficulty for the supposed primacy of 
transitions comes from a developmental study by Simon yl974). He showed that 
children older than eight years do not categorize any members of a 'Goat- 
Coat' VOT continuum as initiated by [gj, unless they contain a low F^ onset 
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frequency. Children younger th.in fivo yo^irs, on the^ other iinnd , indicate 
thnt they hnvc^' perceived l«| in the absence of the spectral cue and appear to 
be primarily sensitive to variation in the temporal t!ue. These results 
support Lisker's assertion of tlVe primacy of the temporal aspect of VOT and 
suggest that it is the nl ty to detect the spectral cue that is learnt. 

At present, it is not clear whether J[^n f ants ' beliavr.or in discriminating 
members r>f VOT continua (c.f. Eimas et al : , 1971; Streeter, 1976) represents 
a psy'-hoacoust ic ability to distinguish successive from simultaneous acoustic 
events, or a phonetic ability to distinguish voiced from voiceless stops 
(Pisoni, in .press). The alternatives could be dissociated by - experiment ing 
with VOT continue (for example, /gri-kri/J on which the phoneme boundary, by 
virtue of a low onset ^frequency, occurred at a considerably longer VOT 
than the simultaneity-succes ivity threshold. Would infants discriminate 
better across the psychoacoust ic boundary , the phonetic boundary, or both?o 

Im'plicat ions- for Studies Using- St imul i Draw n f,rom VOT Cont inua 

. ' The demonstration that the temporal and spectral components of VOT may 
be traded for one ano^er and that, by impl icatibn ,. each ,pos.sesses perceptual 
pptency in cueing t^/e voicing distinction, has methodologic al import for 
'Studies whose stimuli are drawn, from VOT continua. , 

Wbier/» transition duration is held unnaturally constant across contin- 
ua that /represent art icijlat iojas in,, which it would norinal ly , v^ry , the 
positioni of phoneme boundaries shipuld. not- vary r->^DarWin and Brad^ (19.75)- 
synthesiLd /de-te/ and /dri-tri/ continua with identical parametric specifi- 
cations pf F] . • Tlie perceptual identification functions for the two continua 
differed/ slightly, but in the reverse direction from that to be expected if 
the bouridary locations were determined by phonetic class: boundaries on the 
/dri-tri/ continuum occurred at shorter VOTs than those on the. /de-te/ 
continuL. Lisker et al. (1975} synthesized Vba-pa/, /da-ta/ and /ga-ka/ 
continuja with identical transition specifications for Fj^., Boundaries on 
these three cont inua co inc ided , in contrast to those obtained in Lisker and 
Abramsdn's original U967) study wh^re the duration of the transition 
covaried ■na.trurally with place of production.^ 

VOT continua involve cutback of the durat ion/ f requency-ext ept of an 
F| transition, then variation in VOT over the duration of this transition 
(for example, between times T^ and T2 in Figure IQ) will alter the physical 



^The value of this test would be nullified if the psychoacoust ic simultaneity 
threshold . varied as a function of the frequency of the lower spectral 
component of . the stimulus. This possibility is currjently under investiga- 
t ion . 

small place-voicing correlation, -equivalent to a shift in the VOT boundary 
of about, plus 'or minus 2 msec, remains even when all acoustic differe.nces 
between st imul i are neutral ized ( see Draper and Haggard , 1974^^ Sawusch and 
' Pisoni , 1974 ; Repp, 1976; Miller, in press). 
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vnluiJH ol both rmH, luju i 1 tMif v;ir i ,Mt; i cm boyonJ the owd ol tlu» tr/inHiiion 
(for examplt*, betwf*f»n tim?*n T3 find T4), or on continu.^ not invnlvinj^ f^n Kj 
t r;inrt i t i<>r\ (lor t'xamplo, botw(?en oitht^r T| n\u\ T2 or betwoon T3 /ind T/^ on 
contour ld|), will only tht^ vnUu' of tho ' 8opnr«t ion cue. H, /im tho 

results of t.ho pnsnont exporimonts suggOHt , the decision to tho value of 
the voicing feziture may bo represented aw being bnsed on a combinntion of 
analogue menaures of these two cues and others (Hoffman, 1958; Haggard, 1974; 
Summe rf ic Id , *1974b)^, then the perceptual . e f feet of a particular change in 
VOT will depend upon the magnitude of the change in the combined value of the 
cues. that' it produces. A VOT shift that changes the physical values, and 
hence the perceptual measures, of both cues should produce a larger perceptu- 
al effect th«nn should one that only varies the value of the separation cue. 
It is likely, in addition, that the perceptual' scaling of the temporal 
separation component of VOT for values greater t;han the simultaneity- 
successivity threshold approximates Weber's Liiw (Abel, 1972; Miller et al., 
1976:). As a result of both these factors, the perceptual effect of a change 
in VOT of lixed size should diminish as the absolute VOT on which that change 
is centered, increases. The perceptual consequences of the two factors have 
not been dissociated, although effects have been observed that reflect their 
joint operation. Pisoni and Lazarus (1974) carried out 4IAX discrimination 
tests of members of a /ba-pa/ continuum involving syllable-initial formant 
transitions of 50 msec . durat ion . They noted that discrimination of srimuli 
differing in VOT by 20 msec was more accurate in the voiced range of VOTs 
from 0 to 40 msec, where the physical values of both cues were changing, than 
in the voiceless- range above 40 msec. Similarly, Sutninerfield (.1975c) meas- 
ured phoneme bpundary widths, de f ined as-^ the difference between the VOTs 
corresponding to 25 percent and 75 percent voiced responses fpr each of eight 
subjects on a /ga-ka/ continuum that was synthesized with an extensive rising 
transition of .60 msec duration and on* "a /gi-ki/ coi.tinuum that was 
synthesized with no Pj transition. Boundary width, in this definition, 
relate's inversely to discrimination in the boundary region and should reflect 
the rate, of change of the combined value of the two cues at the boundary. 
Mean phoneme boundaries occurred at +29.0 msec in /a/-context a^d at 
+41.6 msec in / i/-con.t ext . Mean boundary widths vere 6.6 msec in /a/-context 
and 10.5 msec^ in / i /"-cont ext . Each of the eight subjects displayed, larger 
boundary widths on the /gi-ki/ continuum than on the /ga-ka/ continuum. 
Si^milarly, estimates of the slope of the psychometric functions underlying 
the continua in Experiment III decreased »^ ign i f ic ant iy as mean phoneme 
boundary loc a t ion inc reasred . In alj these studies, discrimination of VOT 
differences was best (a) at shorter as opposed to longer' .VOTs , and (b) when 
the change in VOT to be discriminated varied both the separation interval and 
the onset frequency of the first formant. 

An implication of these observations -is that the size of the change in 
the position of the phoneme boundary on a VOT continuum induced by ' a; given 
difference in some contextual variable will be greatest when 'the induced 



'A , Q . Summerf ield ( 1 975c ) ' In format ion-process ing analy^ses of perceptual ad'-^ 
justments to source and context variabiles in speech. Doctoral Dissertation, 
The Queen*s University of Belfast (unpublished). 
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change occurs at a large -mean VOT and only varies tbe duration of the 
separation^ue* It will be smallest when the change occurs at short VOTs and 
varies, both the onset frequency of stnd the duration of. the separat|.on cue,, 
Suminerfield (1975b) measured the size of shifts in the phoneme boundary on 
.VOT continua caused by- variation in the' syllabic rate of ^phrases that 
introduc^^t.est syllables drawn from the. continua. On cent inua synthesized 
with the voweK/i/ (where' Fj^^ was. low in frequency and there was only a 'small 
Fjj^ transition), phoneme boundaries fell . at longer VOTs" and larger' phoneme 
.boundary shifts were measured, than . on cont inua with .the> vowel /a/ (where 
there was an extensive F]^ transition). The observations confirm the above 
..deductions ^con'cerning d is'criminabil ity avid lend force to recent warnings by 
Abramson (1976) that , the,. VOT dimension, though a simple temporal continuum 
when viewed, in , art icula tory terms , involves variation in a complex set of 
acoustic. - parameters whose relative availability is a function of both 
absolute VOT and phonetic context. The interpretation- of data obtained with 
stimuli drawn from such-^cont inua is only valid if it takes this complexity 
into account . 

. • ^ • SUMMARY AND CONCLUSIONS 

The experiments reported here permit two condlusipns : (1) The percei>l^ed 
onset frequency of Fj is the critical spectral parameter included in the 
repertoire of cues to. the voicing decision^for syllable-initial prestressed 
stop-consonant s in English. . The. spectral influence derives only from F^ , not 
from the spectrum comprising and the higher f ormant s . (2) A periodically 
excited, tising first fprthant transition is not, per se a positive cue . to 
voicin'g when its onset frequency is controlled. ^ 

In perception, the temporal separat ion component of VOT and the F^^ .onset 
frequency component may be trailed one for the other:' the lower the frequency 
of Fi at the ons.et pf-yoicin^^, the longer,, the separat ion interval required to 
produce 'a voix^eles^^ percept . This ^trading relationship parallels one in 
production where VOT varies -inversely with the degree of vocal tract 
constriction, and hence with the frequency o f --Fj , required " by the phoneme 
following the stop-consonant. ... 

the greater role pf Fj onset frequency than of Fj transition here does 
not imply that transition characteristics are never important in /speech 
^perception. A rising first formant. at the onset of a pattern of /formant 
frequencies signals ah. obstruent articulation "and is more likely to/predis- 
pose a consonantal percept, than is a fixed-frequency trans it ionleyss first 

: — ^ . ■ - . ■; 7 , . 

^Not all aspects of the , present results are entirely novel. / Liberman, 
Delattre'and Cooper (1958**) noted that tutting back Fj changed tjie values of 
"two correlated variables:' the onset t ime . of • Fj relat ive. to F^^and" F3; and 
the onset frequency of. Fj . They demonstrated that relative onset time has 
, perceptual significance independent of onset frequency. ; "Whether Fj onset 
. frequency had independent perceptual significance was not r/eport^d at the 
■ time. The int ervening yea^rs have enabled us to bring more sophisticated 
synthesis, psychophysical methods, and both psychological and articulatory 
interpretations to the class ical -problem of specifying the cues. . . 



formant onsecting at the same, frequency. Such a /rapid spectral change need 
not be confined to the spectrum above. 1 kHz as Stevens ( 1975) suggests. It 
is sottfe-dhing to wh ich an -Xi_.txans it ion , rel ieved of the burden of character- 



izing .(+voiced) , contributes . 
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Perceptual Integration and Selective Attention in Speech Percept ion: * Further 
Experiments on Intervocalic Stop Consonants 

. Bruno H . Repp 



ABSTRACT 

Thr^ experiments on the perceptual interaction" between impio- 
us ij^js-^^'ar^^ explosive formant transitions of intervocalic stop conso- 

-^^nants were conducced using synthetic VCV utterances., Experiment' I 
demonstrates that implosive transitions are difficult to perceive 

I correctly when followed by a steady-state vowel after a short silent 
. interval (closure).- Thus, perception of the stop is inter fered with 
even when no conflicting explosive * transitions follow the closure 
period , The same experiment also shows that VCV stimuli in which 
the" implosiye transitions arc followed by. conflicting explosive 
transitions are difficult' tO ,d iscriminate ' from 'stimuli in. wHich the 
implosive transit ions are phonetically compat ible with the explosive 
transitions or absent altogether, as long as the closure duration is 
sufficiently shottT Thus, the interference effect is . as pronounced 
in terms df discrimination performance as it is'- in identification. 
Experiment II, a react lonrt ime (RT) task, repl ic^ites , the finding:, 
that "same" judgmentfs abQirf the medial- consonants in twb succesgive t 
VCV utterances are faster and more -accurate 'when the final vowels 
are the same than when they are different. Eliminating the explo- ' 
sive transitions does ncJt reduce the effect, not even^-at relatively 
long closure dura.u?.cna, which indicates a general perceptual inte- 
gration effect that is not mediated by the acoustic covariation of 
explo.sive transitions with the final vowel. The data suggest that, 
in addition to complete stimulus identity — which apparently is 
detec^^d at a prephohetic, holistic* stage of processing, — equality 
of overall st imulu^ ' structure (VC vs. VCV) facilitates '-same" 
judgments .. The size of the perceptual units compared seems to 
depend on the structure of the ^ stimulus presented *^irst. Experiment 
III investigates perceptual interact ionis between implosive and ex- . 
plosive transitions by preceding stimuli from a /be/-/de/ continuum 
with either /ah/ or /ad/.,-, or following stimuli from an /^b/-/ad/ 
continuum with either /be/ or /de/. Precursbr/postcursor effects on 
t-he stimuli frota the acoustic continua are measuried on . a six-point 
rating scale. At a closure fdurajfion of 25 {msec, the impl.o,s iv.e 
transitions exert a pronounced assimilative effect on t'he perception 
of the explosive trans it ions , al though the .former are not perceived 
as a separate phonemic event. At a closure duration of 265 msec, " 
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explosive triinsitions exert, a slight contrastive effect on implosive 
; transitions (now perceived as a separate consonant ) but not vice \ 
versa-. ... " "'" I 

The present experiments continue./and -extend research reported earlier by 
Donnan, Raphael^ Liberman, and Repp C1975) and Repp (1975, 1976a, 1976b), 
For^a general introduction, the reader is ' re f erred to these earlier articles; 

EXPERIMENT ^ ' . • - \. 

This experiment . had two parts: an identification task and a discrimina- 
tion task. Dorman et al . (19,75) demonstrated that a VCi~C2V uttera.nce , for 
example, 7 eb-de/ , tends to be perceived as VC2V .('that is, /zdz/) if the stop 
V closure' ■ interval, is artificially shortened. A period of 50-80 msec of 
silence is needed to identify C2 correctly. In an experiment us ing. syjithet ic 
stimuli similar to those u§ed in the present' ^tud ie s , " Be cky Treiman^ found 
asymptotic identification performance at a closure perf6d 'of 60 msec. 
Informal observations of my own shpwed that it is not- absolutely necesssry to 
follow the implosive t.ransitions (C^) with con f 1 ic t ing explos ive transitions 
(C2) for the perception of 'C^ to' be impaired; similar interference also 
seemed to occur in VC-V utterances, that is, when the implosive transitions 
were followed (after a short period of silence) by a steady-state vowel that 
did tiot provide con fl ic ting in format ion ab'out the place of art icu].at ion of 
the stop consonant! This effect was to be demonstrated more formally in Task 
1 of the present experiment. 

The results of Donnan et al. and 'Treiman were obtained in identification 
tasks^" where the .subjects simply wrote down what, they heard. While /eb-de/ 
with a very short closure per iod" niay_ sound like /ede/,'the question remains 
whether it rounds exactly , 1 ike a "real" /ede/ ithat is, /ed-de/) with the 
same closure period.' Al though the two ' ut terance s are phonet ically al ike , 
they may'.still have a d iscr'imindbly d i f ferent, aud itory quality, or one may be 
a less convincing instance of /sde/ than the other! RdcentTy, I demonstrated 
(Repp-, 1976b) that it is very difficult to discritninate /ed.-de/" fr-qm /e-de/ , 
where th^ iraplpslve trans it ions ^have been substituted with steady-state^vowel 
formants. Performance in this task approached chance at a closure period of 
65 msec, the shortest interval used in this .earlier studyv To^ explore both 
issues further, three types of utterances were tested for their discrimina- 
bilit.y from each" other in' Task 2 of. the present experiment. The three 
stimulus types were . VC-CV (for example, /ab-de/, heard as /ade /^^ at short 
closure intervals), V(C)-CV (for example , 7ad-d e/ , wh ich, is " a Iways heard as 
/^de/ '.at the closure durat ions ■ used here), and V-CV (for example , /a-d ^/ , 
which is heard as /ade/ at short closure durations arid as /a-de/ -- with a 
"percept i*ble pa-use between initial vowel and consonant — at longer closure 
durations). These stimuli differed only in the portions immediately preced- 
ing the silent closure inl^val: the implosive transitions were, either 
incompatible with the explo3i\fe. transitions (VC-CV), compat ible (V(C)-CV), or 
completely absent ^ (V-CV). Task 2 'of Experiment I was designed to determine 



^This experiment was. conducted by Ms. Treiman, with my assistance, to fulfill 
a courge requirement at Yale University. No formal wri,te-up is available. 
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the funct ions th.at relate discrimination accuracy to closure duration for the 
three pairs of stimulus types, ^ ^ ' 

Method ■ ; , 

Subjects . The subjects were 10 relatively inexperienced listeners and 
myself Eight of the subjects had previously participated in Experiment II 
(described later in thi's paper) and thus had be^n exposed to stimuli very 
similar to. those' in the present experiment. • > ' 

St imul i . The stimuli were derived ;from those used in my earlier studies 
(s€^e Repp, 1976b, for details). The basic stimuli were /abe/ , /abi/., /ade/, 
and /adi/, synthesized on the Raskins Laboratories parallel formant synthe- 
sizer. In the stimuli for Task 1, the explos ive transitions of the medial 
"stop consonant were replaced with the steady-state formants of the following 
vowel. . This resultec|. in /ab-e/, /ab-ii/, /ad-e/ , and* /ad-i/, J:hat is, /ah/ 
and /ad/ Allowed by either /e/or /i/ after a variable closure^durat ion . The 
closure intervals ranged from 0 to 125 msec in 257msec steps. ' The resulting 
24 stimuli were recorded in five different randomizations with interstimulus 
intervals (ISIs) of 3 sec . 3' jj;^ series was preceded by a random sequence of 
10 /ah/ and 10 /ad/ syllables. ' - 

Three types of stimuli were prepared for Task 2. The. original stimuli 
represented the V(C)-CV set in which both iiriplosive.„ and explosive transitions 
>^re "appropriate for the sime place of articulation. VC-CV utterances /ab- 
de/, /ab-di/, /ad-be/, and /ad-bi/^were obtained by interchanging the'VC 
portions of the V(C)-CV stimuli' V-CV stimuli /a-be/ , /a-b i/ , /a-de/, and 
/a-di/ were obtained by replacing the implosiVe transitioas with the steady- 
stgte formants of the initial vowel, holding formant amplitudes constant. 
The closure durations used ranged from 0 to 100 msec in 25-msec steps, except 
for the practice t r ia Is , 'where the stimuli/had a 250-msec closure interval. 

There were three discrimination conditions: V(C)-CV vs. VC-CV, VCC)-CV 
vs. V'-CV, and VC-CV vs. .V-CV. An AXB paradigm was used, that is, the first 
and the last stimulus in a triad were always different from each othet, and 
the second stimulus Was identical with either the iirst or the third. The 
stimuli in a triad always had the same closure duration and- the same CV 
portions; they differed only in the information immediately preceding the 
closure, interval. In 'each condition, there were 80 AXB . t riad s re su 1 1 iYig' 
from 4 stimuli with 5 closure durations in 4 AXB -configurations (AAB, ABB, 
BAA, BBA) . Each series was randomized and' recorded as a .^epayate block, 
preceded by 16 practice trials (stimuli with 250-msec closure rintervals). 



"^My own data were included because they were not. qiual itat ively different from 
those of the other subjects (although I made fewer errors) and because they 
h^d— ^1-so been inc luded ' in Experiment II of Repp ( 1976b), which was to be 
compared with the present results. 

-'Note^that, in this paper, ^the term ISI never denotes the brief silent 
interval between the, VC and CV (or V) portions o f * s t imu 1 i , wh ich is. always 
r^ferr^ed to as clQSure interval. • *" 



The within-triad ISI was 1 sec; the between-triad ISI, 3 sec. ^; 

Procedure ♦ All subjects, first ' did the identification task. After 
listening to the short practice, list of -VC syllables, the subjects listened 
twice to the VC-V identification series. The first time they were instructe.d 
to write down B when they heard /ab-re/ of /ab~i/, D whet), they heard /ad-e/ or. 
., /ad-i/, and 0 when they heard no consonant at all, that^s, /a~e/ or /a-i/. 
In the second run, O~responses were no longer . permit ted'y and a forced choice 
between B and D had to be made for each stimulus. 

The s'equence of the three discrimination, conditions was . approximately 
counterbalanced across .subjects. The -structure of the stimuli was explained 
before each condition, so that the subjects knew, quite well what they .were 
listening to and what they were trying to discriminate. The responses^ were A 
.and B, whichever the X stimulus in the AXB triad e.qualled. 

Other procedural details were the same as in previous studies (see Repp, 
1976b) . " . ' \ ' . • , • . ' 

Results . .... \ . 

Task 1 : VC-V Identification . All subjects identified the 20 practice 
VC syllables without . di f ficul ty . Only a single error was committed. The 
results of. the fitst run through* thu VC-V identification task . are shown in 
Figure la. Tne dashed line represents O-r^'esponses , the solid line the total 
error rate ( thv-^t is, Orresponses plus confusion errors).. The difference 
between the two functions is. the percentage of confusions, which ' did not 
change at all with cTbsare duration. The percentage of 0-responses declined 
rapidly over the first 25 msec and then. more slowly. 

Estimates of perfprmaace level in Run 1 may be obtained by assumingi 
that, if forced to guess instead of responding 0, thr». suijjects would have 
been correct 50 percent of the time.. These estimates are shown in Figure lb 
togethet with the results of the second run (where 0-responses we're not 
permitted). : It can be seen that . pe^^^yormance was close to chance when there 
was no closure interval at all, but il improved rapidly as clo5ur'e duration 
increased. An asymptote seemed to be reached at a closure duration of 100 
msec; however,, note that the as5miptotic error rate was much higher than for 
.VC " syllables in isolation! Perfprmance in Run 2 was better than in Run L. 
This may reflect not only practice e.ffects, -but alsy|b the incorrectness of . the 
assumption that all 0-responses were equivalent to rj^ndom giiesses. 

A 4-way analysis of variance was performed' on i'the data in Figure lb, 
with consonant and (final) vowel as additional factors. The effects of runs 
^^1,10 = 26.6, p < .01) and. of closure duration (F5 50 = 33.5, p <<. ,.01) were, 
highly significant. In addition, however, there was a significant effect of 
consonant (Fj " 5.5, p < .05) and a highly significant consonant x vowel 
interaction (^1^10 =17.2, p < .01). This interaction is shown in Figure 2. 
It is evident that^ when, followed by a vowel, /ah/ was. much easier to. 
identify than /ad/, especially at longer^ closure durations, where., /ad/ 
stimuli were solely responsible for the high error rates. In addition, /ad- ' 
i/ was much more d i f f icu 1 1 • than /ad-e/, but /ab-€ / was more difficult than 
/ab-i/. . ^ V ; - : . ■ . 
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Figure L: VC-V identification errors as a function of closure duration. 
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Task 2; . VCV Discriminataori ■ As expected; th'e V(C)~CV vs. V-CV 
<compatible vs. absent itnplosive transitions) condition was the most, diffi- 
cult . This was evident - already in the practice trials, where the average 
error rate was 16.5 percent in this condition, but only 6.8 and 6.3 percent, 
respectively,* in the ot-her two conditions. The results for .the shorter 
closure .durations are -shown in Figure 3. The compat ible-absent condition was 
more difficult than the incompatible-absent condition, which in turn was more 
difficult tha^j t'he compatible-incompatible condition. Error rates declined 
steadily* as closure, duration increased, but were still considerably abbv"' the 
practice trial error rate at the longest closure duration, which, thus does 
not . represent the asymptote. 

A 4-way analysis of variance showed not only highly significant ^effects 
of condition 20 " 17.3, p < .01), and closure dui-ation (F4.40 ~ 38'.1,' 

p <<.01), but also a significant effect of vowel (F]^ = 7.9, p < .05) and 
a significant condition x consonant interaction (F2^20 " 13.6, p< .010. 
Since none of these effects interacted with closure duration, the data were 
collapsed over this factor and each condition was analyzed separately in a 2- 
way analysis of Variance. " ■■ - - 

In the compatible-absent condition, there was only a significant effect 
of consonant OF^ 10 ~ 13. i, p < .01) which is shown in Figure 4a. Quite 
obviously, the presence o f implos ive transitions was much more difficult to 
detect^ in /ah/ (B) than in /ad/ (D). For /ab/^ performance remained at 
chance level up to a c^losure duratioli of 50 msec or more, while for /ad/ the 
error percentage decreased almost linearly from the beginning. These resuls' 
are in excellent agreement with those of Repp (1976b, 'Experiment II, Task 3, 
Figure 5) where^ exactly the same difference was found at slightly longer 
closure durations.. 

in the compat ible-rincompatible. condition, there was^gigily a marginally 
significant effect of vowel (F^ j^q 5.5, p < ;05) . Stimuli ending in /-e/ 
were easier than stimuli ^ending in A-i/, but this difference was present only 
at two closure durations (25 and 75 msec).. A similar effect was present in 
.the incompatible-absent condition but did not quite reach significance (Fj^^j^q 
= 4.2, p < . 10).. However, in the latter . condit ion , there was a significant 
effect of consonant"^ (Fi iQ,= 8.9, p < .02). As ind4cated ty the significant 
•condit ion .rx cx^ n s o n a n t 1 ri t e r a c t i o n _Q b t a i^ e d car 1 ier, 'this effect was in-they' 
opposite -direct ion of. that in t^e compatible-absent condition. However, 
since the consonant factor in this earlier analysis . reflected the nature of 
the . explosive transitions (which alone were coristant from condition to 
condition), it is obvious that the v consonant e f feet in the . incompat ible 
ab'sent condition (shown in Figure 4b) was the same afs that in the. compatible- 
abs'ent condition ,in terms , of implosive transition's . Thus, the presence of' 
implosive labial transitions was more difficult to detect, regardless of 
whether they were followeji by compatible or incompatible explosive transi- 
tions. That the_two consonant effects shown in Figure 4 were in fact ciue to 
the implosive transitions and not the , explosive transitions , is also support- 
ed, by the complete absence o.f a consonant effect in the compatible- 
incompatible condition, wher^ the consonant factor reflected only yariatiorl 
in the explosive transitions. j 
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CLOSURE DLIRfllION:(MSEC) 



CLOSURE DIHHTION (NSEC) 



Figure A: Differe^ces Aween consonants (iniplosive/ transitions) in two 
. / t ^ discrimination tasks. 



Discussion 

The VC-V identification task demonstrates that it is not necessary to 
follow implosive transitions with explosive transitions to produce interfer- 
ence at short closure durations. A steady-state vowel is su f f ic ient . , No 
direct comparison of the two effects. was conducted here, but comparison wii:h 
the r^esults of Dorman et al* ( 1975) and Treiman (see Footnote 1) suggests 
that the interference of a steady-state vowel is somewhat less ^evere^ than 
'that of incompatible explosive transitions at short c losure durat ions , but 
that .it extends to longer ||:losure durations. Overall, the two effects do not 
seem fundament al ly differe|)t /rom each other. ^ 

This finding is compatible with both expla.nat ions that have been^ 
forwarded for the interference effect. One explanat ion claims it is true' 
recognition' backward masking, thai is, interruption of processing (Massaro, 
1975); the other refers to links between perception and production and 
assumes that the perceptual system refuses to deal with speechlike rounds 
that are impossible to articulate (Dorman et al . , 1975 ; Liberman, 1975 ), It 
is certainly true that a ' VC-V utterance with a perfectly steady-state final 
vowel could not be pronounced with closure durations as short as the ones 
used here, Therr ib a mr -e spec i f ic ' impl icat ion for the backward-masking 
hypothesis: if it is corn^M: t , t hen the interruption of processing of the 
implosive transitions probably does not take place in a mechanism sp'ec ial ized 
for the perception of. stop consonants or place of articulation, since rthe 
masking vowel presumably dees not engage this processing mechanism. Rather, 
we. seem to be dealing with a more general aud it ory . interference with -the 
perception of implosive transitions. 

The large differences due to consonants ( /b/ vs. /d/) and vowels (/e/ 
vs. /i/) were quite surprising. On the basis of the acoustic characteris- 
tics of the target stimuli alone, a consona.nt effect in the opposite 
direction might have beeji expected. /ab/ differed from /ad/ not only in the 
second- and third-formant transitions, but./it also had a shorter first- 
formant (Fj) transition with a higher '"terminal frequency . ^ , Since the Fj 
transition is an important manner tue, one might have expected /b/ to be less 
"stbp-like" and therefore more susceptible to interference . than /d/. 
Instead, /d/ was much more affected by the \ fol lowing vowel. Closer inspec- 
ti'^n'\of the data from Run 1 showed, however, that thi^s difference was'; 
primarily due to genuine mis ident i f icat ions o_f /d/ as /b/; Xhere was a much 
s'malle\r difference between the two consonants in terms of 0-r\sponses, which 
reflect • detect ion of the manner cue. Thus, it wat; ' alveolar place of 
Articulation that spec i f ica 1 ly suffered from interference. 



t \ 

^Malmbe.rg\ ( 1955) noted long ago . that the consonant of- VC-V stimii^ri 
, perceptually grouped with the final vowel when the closure duration is 
very short\ he ,did not mention any interference effect. This difference, 
be ascribed, to Malmberg's use of sophisticated 1 isteners , and perhaps to the 
identity of (:he initial and final vowels in his stimuli. ; ^ 

^This difference was not really intended but somehow crept into ^the original 
stimulus., set and' then was carried along. It was also present in the. earlier 
experiments (Repp, r976a, 1976b) and was eliminated only in the present 
Experiments II and III. • . ■ 




Figure la shews, there was virtually no decline in the frequency. of genuine 
••^confusions as closure duration increased. Even at a closure duration of 125 
msec, a large proportion of /d/ stimuli was labeled /b/, despite the- fact 
that the VC portions \ere perfectly identifiable in/ isolat ion . This was ^ 
surprising since the fina^k /d/ was,, in a sen-se, acoustically more /'prominent" 
than the final /b/ , due to its ^steeply rising third-formant transitions 
(cf. also Repp, l976b). .Precisely for this reason, however, /ad/ ' perhaps 
sounded' less natural than /ab/, and this may have been responsible for its 
poor ident i f ica^ ion when followed by a steady-state vowel. 

The interaction between consonants and vowels (Figure 2) is even more* 

intriguing. -Closer inspection of the data from Run 1 indicates that the 
differential effect of the two vowel masks on Vab/ was entirely due ,to 0- 
responses, while, with /ad/, both 0-responses and genuine errors shciwed a 
1 arge vowel effect. The differential effect of the two vowel s on the 
d etec t ab il ity of the manner cue (implosive F^ transitions) may have been due 
to their different Fj frequencies. ^However , why did it interact with the 
final consonants? As mentioned ear lier , , /ad/ had a longer F^ transition with 
a lower offset frequency than /ab/; /i/ had a lower F]_ than /e/. Thus, the 
Fj of /i7 '(^279 Hz) was closer to' the F^ offset of /ad/ (381 Hz), and the F^ 
of Izl (535 Hz) was closer to the F^ offset of /ab/ (560 Hz).". .This' relative 
continuity of Y\ may have led to the perceptual illusion of a* transition 
between two vowels without any intervening vsilence . We thus arrive at the : — : 
"admittedly speculative — hypothesis that a .listener will be less likely fo" 
perceive an implosive F^ transition as a stop manner cue if it .points towards 
the Fi frequency of a following vowel. 

The devastating effect of HI on the perception of /ad/ remains to be 
explained. Perhaps the relative continuity of the second and third formants 
(F2 and.F3) can provide an explanation. /ad/ had rising implosive F2 and F3 
transitions; the F2 offset (1459 Hz) was below the F2 of Izl (1840 Hz) and 
far below the F2 of /i/ ( 2298 Hz), while the F3 offset (3363 . Hz) was aboVe 
the. F3 of HI (30.29 Hz) and far above the\F3'of /e/ (2527 Hz). A formant 
continuity interpretation would be possible only for F3 but not for F2 , for 
which the relationships are reversed. However,, to the degree that the F3' 
transition was responsible for the somewhat artificial sound o.f /ad/, its 
relative cont inuity with the F3 of the .followin"g vowe 1 . ( e spec ia 1 ly /i/) may 
have specifically harmed /d/ identification.. 

Thus, the results point towards frequency-specific interactions between 
the implosive transitions and the following vowel. ReTative continuity of 
formants across the intervening silence seems to make it more difficult, to 
perceive manner and ^ place conveyed by implosive t-ransit ions . This effect is 
reasonable from an auditory information processing- viewpoi^nt. However, an 
interpretation in terms- of articulatory relationships may also be possible, 
since auditory and articulatory variables are highly correlated. 

Turning , to the results of the, d iscr iminat ion task, note first that the 
general interfering effect of on C] in VC-CV utterances was confirmed.. As 
closure duration was decreased, VC~CV became increasingly more difficult to 
discriminate from V(G)~CV and V-CV: The extent and t ime- cour se of the effect 
were not only similar to those . reported earlier by Dorman et al . ( 1975), but 
they ajso paralleled the results in the VC~V ident if icat ion * task , c.onfirming 

.^7 
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that backward inter ference by a steady-state vowel and by an incompatible CV 
syllable are basically similar effects. Th^ fact that the present "masking'* 
functions extend over a wider range of closure durations than those of Dortrian 
et al. (1975) may be due to differences in stimulus istructure and methodolo- 

-Despite the fact that V(C)~CV and V-CV utterances were very difficult to 
discriminate from each other, VC-CV. stimuli werfe consistently easier to 
di'scrlnrinate from V(C)~CV s tiniul i than from V-CV stimuli. In the'V(C)-CV vs.* 
VC-CV condition, the difference consisted in a large difference in- F2 and F3 
and a small difference in . In the V-CV vs. VC-CV condition, the 
difference in was larger, but that in F2 and F3 Was. smaller. Apparently, 
.'then, it was the difference in, the higher formants that was more important 
for discrimination performance. The acous t ic' di f ferences to be discriminated 
in VC-CV vs; y-CV> and in. V(C)-CV vs. V-CV were about equivalent, and indeed 
performance in the two conditions was similar at the two shortest closure 
durations (see Figure 4). At longer' closure durations, the former condition 
had an advantage as the difference in phonetic structure began to em^erge. 
yhus, the results suggest that discriminations at very short closure dura- 
tions were made primarily on the basis of auditory differences (very 
int f f ic iftntly ) , while at longer closure durations, ph'onetic distinctions 
playeJ an increasing role. The difference between VC-CV vs. V(C)-CV.aftd VC- 
CV vs.- V-CV at longer closure durations may also reflect a phonetic factor, 
as suggested by my own observations: when V-CV stimuli were paired with VC- 
Cy s t imul i , the VC-CV context sometimes induced the V-CV stimuli to be heard 
as VC-CV too, thus reducing discrimination accuracy. In V(C)-CV stimuli, the 
presence of ^(compatible) implpsive transitions apparently prevented such 
phonetic, illusions . JHy also seems that. they did not occur in V(C)-CV vs./V- 
CV discrimination, so that the better-than-chance d isc r itninabil i t y of the^s'e 
stirhuli must be ascribed to. an auditory cue a slight discontinuity between 
initial vowel and consonant in VC-V stimuli that became noticeable as closure 
duration increased. • . , 

The strong consonant effect in the V(C)-CV vs. V-CV ccndition* replicated 
the effect found by Repp (I9;6b). Most likely, it was due to the perceptible 
acoustic difference between the implosive transitions of the two .con sonant s . 
The steeper Fj and F3 transitions of /ad/ and its resulting somewhat strident 
sound insured its 'fetter discrim inability from the steady-state vowel. The 
consonant effect was less pronounced in' the VC-CV vs, V'-CV condition, perhaps 
because of the higher performance' level there, which resulted from the 
addit ional "phonet ic .factor aiding discrimination. 

While both the above conditions involved the discrimination of a VC 
stimulus f^om a V stimulus in the presence of a constant \CV "mask" — which 
amounts -to detecting the presence of implosive transitions — the third 
condition, V(C)-CV vs.- VC^CV, involved the discriminatioA of- tw(f different 
types of implosive transitions. Thus, unlike the other\ two conditions, 
exactly the same target discrimination was required on every^ trial, and only 
the CV mask varied. In contrast to the variation in implosive transitions,^ 
the variation in'explosive transitions had no effect on per fortnance . ~ 
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EXF'ERIMENT , II 

This experiment was a. follow-up to Experiment I of Repp (1976b) and very 
similar in *design. Repp (I975 » 1976a) showed that ''same'' judgments about the 
medial consonants of two successive VCV (that is, V(C)-CV) u t terances . have ' 
shorter latencies, when the final vowel s are the same , than when they are 
different, and that, this effect persists when the closure period is^ 
increased. Since the absolute latehcies also increased with closure 
duration, it seemed that the subjects based their decisions solely on the CV 
portions of the stimuli. Repp ( 1976b, Experiment I) used a design* that 
randomly mixed VC and VCV utterances , in order to force the listeners to 
focus on the implosive transitions. This procedure was successful in so far 
as the latencies no longer increased systematically with the closure duration 
of the VCV stimuli. Paradoxically, however, the effect of the final vowel on 
"same" latencies did not disappear at long , closure durations -- a result that 
could not be explained, since the latencies seemed . to indicate that the 
subjects relied on the implosive transitions alone, which were independent of 
the final vowel. . 

Repp (1976b, Experiment II) employed a simpler choice-reaction time task 
to get at the same problon. By presenting- VCV stimuli withV and without 
implosive transitions (thi. ''fC)-CV and V-CV stimuli) and varying closure 

duration, I demonstrated espense latencies for deciding whether a 

stimulus began with /ab/ or . c/ increased with closure duration for V-CV 
stimuli, but not for V(e;-.C. stimuli;. Clearly, then, the listeners were 
paying selective attention to the . VC - portion of the stimuli. However, 
latencies for isolated VC stimuli were faster than for V'.C)-CV stimuli, which 
showec thot the following CV. portion in V(C)-CV stimuli still affected the 
decision process. ' . ' . ' 

^ . ■ - ., -^-k., . . ■ • . ./ 

.The alternative, and perhaps more obvious, procedure to investigate the 
influence of the CV portion on decisions about the VC porLion is to remove 
the explosive . transit ions and comp-are latencies for V(C)-CV and VC-V stimuli. 
Tnis V a'>proach was taken in the present experiment, after some hesitation. 
While removing the implosive transitions of a V(,C)-CV stimulus has little 
perceptual consequence (Vi.C)-CV and V-CV stimuli sound extrcuiv?ly . similar at 
short clob^ure durations --cf. Experiment. I), removing the expl osive trans i- 
tions'has a much more disturbing effect: •V(C)-CV and VC-V stimuli sound 
differently, especially at short closure durations,- wh^re the consonant in 
VC-V stimuli is difficult to pei:'ceive (cf. Experiment I;. Thus, high error 
rates were to be expected, but I nevertheless found the experiment worth 
attempting. ' ^ • , "* ^ 

The present experiment consisted of three tasks. .Task 1 served to 
familiarize the listener vinh the ba^ic target ;Stimuli; it required a simple 
forced-choice classification oJ the two Gtianriard VC syllables, /'ab/ and /ad/. 
Tab^k 2 was ' also a . consonant clcissif ication task, but here* most of the VC 
targets'were followtu by either a phonetically compatible CV syl ] ab l,e . or by a 
steady-state vowr:l , after one of two closure intervals. It was expected that 
whatever influence the explosive Lr ans ic ions exerted" on consonant judgments 
wovld be absent in VC V stimuli, oO that latencies 2 expected to be faster 
for VC-V stim'.ili than for V(C)-CV stimuli. However, s ince' the^' int e 1 1 ig ibil i- 
ty ot the VC-V consonant suffered at short closure duratio.n§., it was 
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considered possible that the faster latencies for VC-V stimuli would emerge 
only at the longer closure duration. ^ 

Task 3 was a same-different reaction time (RT) task. Here, as in the 
previous experiments* (Repp, 1975, 1976a', i976b — Experiment I*, Task 3), the 
effect of principal interest was the influence of the final vowel on the 
latency of "same" judgments. The design includied V(C)-CV and VC~V stimuli 
with two closure durations, as well as VC stimuli, in various combinations' 
with e'^ch other. Repp ( 1976b) hypothesized that, in V(C)-CV pairs, the 
subjects compared the explo'sive transitions instead of the ,implosive transi- 
tions on sdme trials, leading to an effect of the final vowel even at long 
closure durations. If thirS interpretation is correct, the final-vowel effect 
should disappear in VC-V pairs t:l;iat do not contain any explosive transitions. 
On the other hand, if the effect of, the final voweT is due to - some more 
general, perceptual integration, it should be present in VC-V stimulus pairs 
as well (perhaps in reduced .magnitude )- Again, some effect at short closure 
durations was to be expected simply because of the interfering effect of the 
final vowel; the, more interesting condition was the long closure duration. 

Although these hypotheses were- formulated in' terms of latencies, the 
experiment contained a safeguard against the possibility that RTs would show 
too much, variability due' to the relative difficulty of the task for 
inexperienced listeners. Earlier experiments have shown that error rates are 
highly correlated with latencies, in this type of task, and as, task difficulty 
increases, they become a more reliable dependent variable than the latencies 
themselves. Most of the hypotheses could therefore be repl a.ced ' by|)"sub s t i t u t~ 
irig "fewer errors" for "faster latencies". As it turned out, .1 l>ad to rely 
heavily on the error rates in interpreting the results of 'the present 
exper iment . 

Metho'd 

Subjects . Ten volunteer subjects participated, all of them relatively 
inexperienced in this Cype of experiment; 

Stimuli.. T'le same basic set' of V(C)'-CV stimuli was used as in- the 
earlier experiments (/abeA, /abi/ , /ade/ , /adi/) . VC-V. stimuli were generat- 
ed byi replacing the explosive transitions with steady-state vowel fbrmants, 
^as in Task 1 of Experiment I. Closure durations were ICO and 250 msec. VC 
St imul i. consisted only of the stimulus portions preceding the silent closure 
interval. One si ight: d if ference between, the present stimuli and those of 
earlier experiments ^ns that the Fj, transitions of Vab/, originally shorter 
than those of /ad/, were mad^ equally long. While this may have, increased 
the , detect ab i 1 ity of implosive labial transitions in V(C)-CV/r stimuli (cf. 
Figure 4a) , it hardly affected the intelligibility of VC-V stimuli in which, - 
prin^arily, /ad/ suffered from the following vowel (cf. Figure 2).. 

the initial VC list "(Task 1) contained 50 stimuli in ra,ndom order with 
ISIs of 3,555 msec. The choice-RT sequence (Task 2).contained 100 stimuli 
presented in^.five individually randomized blocks of 20. Each block contained 
16 VCV stimuli (four basic stimuli with or without explosive trans it ions at 
two. closure durations) and 4 VC stimuli. The • TSI covaried with closure 
duration and stimulus type; it was the stimulus onset ' (or' VC offset) 

50 • • 
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asynchrony that was held constant at 3,740 msec. The tape for Task 3 
contained two individually randomized blocks of 144 stimulus p^airs'. • Each 
block contained all pairwise combinations of the four V(C)-'CV stimuli and all 
paijrwise combinations of the four VC-V stimuli at each of the t^wo closure 
durations, resul t ing in 2 x 2 x 16'— 64 stimulus pairs; plus. all combinations 
of " the two VC stimuli with all V(C)-CV and VC-V stimuli at each closure 
duration,' resulting in another 2 x 2 x 16 = 64, stimulus pairs; plus four 
replications of the four VC . combinat ions . Note that the two stimuli in a VCV 
pair always were of the same type (V(C)-'CV or VC-V) and had the same> c losure 
duration. The within-pair onset asynchrony was constant at- 1 sec; the 
between-pair onset asynchrony (from the onset of the second stimulus in a 
pair to the onset of the first stimulus of the next pair) was. 3,740 m'^ec . 

Procedure ■ Equipment, pirocedure, and analysis were almost \exact»ly 
identical to those of Repp ( 1976b , Experiment I). Only the main features 
shall be repeated here. In Tasks 1 and 2, the subjects pressed one response 
key for' /ab/ and the other for /ad/, ignoring the final vowel, if present. 
The reisponse-hand ass ignment was varied from subject^to subject. In. T^sk 3, 
all subjects responded "same" with the (preferred) right hand and "different** 
with the left. It was emphasized to respond as quickly as possible, to 
ignore the final vowels, and not to wait for the end of an utterance before 
responding. ' It was mentioned that some stimuli might be a little \m6re 
difficult to identify than others. . Subjects were asked to ^'correct" their' 
own errors (if realized) by quickly pressing Xthe other key. (This procedure 
was found use ful-^' in earlier studies but had been neglected in the ear)lier 
experiments o.fif/this series , ) Each subject listened to the two blocks twice, 
that is, to,-^4"x 144 = 576 stimulus pairs altogethe"^^ All tasks were preceded 
by a f ew,-,;t]pinu tes of practice selected randomly from the tapes. 

' '^ Dat^ ' analysis was conducted on the median RTs Vof correct responses 
calculated fr,om 25 stimulus replications in Task 1 , - f rom 5 replications (\10 
for VC stimuli) in Task 2, and from 8, responses (16 for VC pairs) in Task '3. 
These eight responses in Task 3 resulted from cross-c l-ass i f ying the responses 
according to the factors blocks (1 and 2ys. 3 and 4), stimulus types (V(C);- 
CV'vs.c VC-V), closure duration (100 vs. 250 msec), same/different consoV 
nant, and same/d i fferent vowel , which left^ eight responses per cell.. Pairs 
containing VC stimuli were analyzed separately- from the other. ( structural l^Jj 
homogeneous) pairs; the factorial design was similar, except that temporal' 
order (VC first or second) replaced the ' sameAdi fferent vowel factor. VC'i 
pairs were not included in. this analysis. . ■ 

RTs were measured from- VC offset in each case. Errors corrected by the 
subjects themselves were omitted from analys is , since earlier studies had 
indicated that they were mostly due to response anticipations or response 
hand confusions and not related^ to the experimental conditions. Except for 
individual differences in frequency, they 'showed, no .obvious pattern in the. 
present experiment either. . . 

Results ' ... ^ 

Task 1 : VC Classification . The subjects had little d if f icul ty ' in ' 
classifying the VC syll.ables in isolation. The overall error rate wa s 1 . 8 
percent, excluding corrected errors ( 2. 0 percent ) . .They 'consisted -of 8 
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errors with /ad/ (3.2 percent) and only 1 error with /ab/ (0.4 percent). RTs 
were faster to /ab/ (368 msec) than to /ad/ (410 msec). This difference was 
shown by eight of the ten ^subjects and* was- significant (F^^io 13.21, 
p.,< .01). It is in the opposite direction'of the difference found by Repp 
(t976b — Experiment 1, Ta^sk 1). in fact, while the average'RTs to /ad/ are 
comparable in the two studies, those to /ab/ were faster in the present study 
by over- 100 msec. This difference most likely reflects the change in the 
transition of /ab/. ' ; ^ . 

Task 2: Choice~RT Task . The results of the choic^-RT task are shown in 
Figure 5. Figure 5a shows th^. latencies. Figure ,5b the error rates. Both 
figures show an interaction between stimulus type and closure duration. 
While closure duration had relatively little effect in V(C)-CV stimuli,, 
performance with VC-V stimuli was much better at the long closure duration 
than at Che short one. This was expected, because the fihal vowel interfered 
with the perception of the implosive transitions at the 100-msec closure 
duration; the error rate was ^- correspond ingly high. It is interesting, 
however, that- at the 100-msec closure duration, VC-V RTs were hardly longer 
than V(C)"-CV RTs, despite the large difference in error rates, and at the 
250-msec c losure^durat ion , VC-V RTs were actually faster than V(C)-CV RTs, 
although,' yC-V stimuli continued to exhibit a' slightly higher error rate. 
Thus, although error rates and latencies tend to be positively correlated, 
S'-'me t ime s 'One measure shows a difference where the other does not (cf. Repp, 
1976b, for similar observations). 

Unfortunately, the RT effects did not reach, significance due to large 
individual difference? and high variability. A 4-way analysis of variance 
(stimulus types'", c losure dura t ions , con sonant s vowels) yielded no signifi- 
cant .effects. Ttans format ions of the data or eliminating subjects with 
exceptionally long RTs. did not help. Thus, no firm conclusions can be drawn 
from the RT'pattern in Figure 5a. . '.. 

The. error patterns were more consistent, although the majority of the 
eri?ors was contributed by a few subjects. The overall error rate was. 9.5 
percent, excluding corrected- errors (3.5 perce^nt). In addition to the 
effects of st:T,ulus type arid closure duration evident in Figure 5b, there 
were the expected large differences between individual stimuli: /adi/ (26.0 
percent), /ade/ (10.5 percent ), 7ab e/ ( 3 .0- percent ) ^ /abi/ (2.0 percent). 
Thus-, the large majority of the errors .,consisted - in alveolar-to-labial 
confusions. For VC-V stimuli 'with a closure duration of 100 msec, the error 
rates for tlie four individual stimuli were 42.0, 34.0v--6.0, and 10.0, 
respectively considerably higher than in Experiment I, Task 1^ (Figure*'?^). 
This difference probably reflects the more stringent demand's of the present 
task .and perhaps context effects; however, the pattern agrees with the 

results shown in Figure 2. 

■ ^ , 

Error r'^tes for VC stimuli were comparable to those for other stimuli at 
the longer closure, duration (Figure 5b). However, RTs tended to be faster 
for VC ' stimuli than for, VCV stimuli (Figure 5b). • VC stimuli in Task 2 
exhibited both higher error rat^s and slower RTs than the VC stimuli in Task 
1 — a context effect also obtained by Repp (1976b). 
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Figure 5: Mean latencies of correct responses and error rates in the choice- 
'RT task as a funct,ibn of closure duration. 



Task 3; Same-Different Reaction Time. The results of Task 3 are shown 
in Figures 6 and 7 Figure 6 shows the data for VCV pairs (that is, pairs 
not containing VC stimuli). It has four panels: panels a and b (above) show 
RTs, panels b and c (below) the corresponding error rates. Panels a and c 
(on the left) are for V(C)~CV stimuli, panels b and d (on the right) are for 
VC-V stimuli. 

TKe latency data were analyz.ed in' a S-Wiy analysis of variance that 
yielcied \several significant effects. However, the effects that were not 
significant provided equally interesting! information: there was no signifi- 
cant praAtice (block) effect, no significant overall increase in . RTs with 
closure duration, no significant overall ; difference between V(C)-^CV ,and VC-V 
pairs,* ana (surprisingly) no significant difference between "same" and 
"different'\ RTs (that were confounded' with right vs. left, response hand). ' 
The only main effect that reached significance was that of same/different; 
vowel (F| 9 = 8.43, p< .05), with faster overall latencies when vowels were 
the same. Several higher-order iioteractions reached significance but, do not 
merit exten^iive discussion. They were primarily due to the precioitous 
decline , in VC-V "different" latencies with closure durat ion* where the words 
were the same (cf. Figure 6b). ' 

It is evident from Figures 6a and 6b that both stimulus ^typ^ showed an 
effect of the final\vowel on "same" labencies. The effect was in the 
expected direction (faster RTs when the vowels were the same) and did not 
decrease as .closure duration increased. The effect of the final vowel -on 
"dif ferent" latehcies was not consistent , on the other hand j. and seemed to 
interact with stimulus types as well as closure duration. The result that 
the final vowels' had a consistent effect on "same" responses only is in 
agreement with earlier experiments, ar^d so is the absence of a decline of 
this effect as closure durat ion increased . \ " . 

To clarify the statistical, reliability of i^? effect, a separate 
analysis of variance was conducted on "same" latenr - only.' The main effect 
of same/different vowel reached significance (F^ g = 5.13, p < .05) and did 
not interact with any other factor. The. only other significant effect was an 
uninterpretable 3-way interaction between the other three factors (blocks, 
closure durations, stimulus typos). j — 

Because of the high rror rates and the great variability of the 
latencies, .the error pattern was likely to provide a more, direct, and 
consistent indicator of the major experimental effects. Figures 6c and \6d 
show quite clearly that (1) more errors were made _pn "d i f f erent " trials than 
on "same" trials (that is., incorrect "same'- res por^&es were more frequent than 
incorrect "different" responses), (2) "different" trials had much higherX 
error r^t.es with VC-V pairs than with V(C)-CV pairs, (3) "different" errors ^ 
(that is, incorrect "same*-' responses) decreased as closure duration 
increased,. but\. "same" errors remained roughly ^ constant, and (4) the 
same/d "i f ferent vowel factor had a clear effect only on "same" errors and was 
independent of closure duration and stimulus type. Error and latency 
^paTTerns for "same" responses are in good agreement, which in part reflects 
the greater reliability of "same" latencies because of the lower error rates 
on "same" trials. There was no increase in accuracy over blocks. All 
effects just men t ioned ' were highly significant in I an" analysis o^, v^iriance. 
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Figure 7: Mean latencies of correct responses and error rates in the same- 
different task: V{C)-CV and VC-^V stimuli paired with VC stimuli j 
^v. and VC pairs . « 
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but since this analysis wap not quite leg'itimate because of the hjighly 
asymmetric distribution- of the er^ror scores, detailed results will not be 
rep)prted here. , . \^ - 

The results for . stimulus pairs containing VC stimuli are shown in Figure 
7 with, panels arranged as in Figure 6. The RTs (Figures 7a and 7b) deserve 
little comment, for, despite their apparent orderliness, a 5-way analysis :_of 
variance did not reveal a single significant effect. (Not even the • interac-. 
. tip.n . of • same/different consonant wi'th closure duration approached signifi- 
cance.) Latencies for VC' pairs seemed to be faster than for other stimulus 
pairs (Figure 7a); however, this difference was not tested for significance. 

It . was -^again in the error rates that differences emerged more clearly. 
Figureis. 7c . and 7d sho^.^^fh at impairs in which the VC s timulus . came first hand 
which had identical consonants'" had much lower error, rates than Qther stimulus 
combinations. Thus temporal ""order of the stimuli in a pair clearly made a 
difference for."sarae" responses ;. for "different" responses, a similar effect 
was observed at the longer closure duration only. (Note that these eff.eccs. 
tended to be reversed in terms of RTs; however, a true speed-accuracy trade- 
off could har.dly underly this inconsistency^. ) Except for the steep increase 
in errors for pairs containing VC-V stimuli jat the shorter closure duration, 
the error patterns for the two stimulus typ'es were.^ quite similar Again, no 
practice effects were ev'-^ent. . All relevant effect^ were significant in an 
analysis of variance. . ' : \ ^ .■• " , 

Although RTs for VC pairs tended to be faster, their error; -rates were 
comparable to those for most- other pairs at the longer closure ^duration. '^'At 
the 250-msec closure duration, only pairs of VC-V stimuli had highly elevated 
' error -rates (Figure 6a).: following both target consonants with- irrelevant, 
vowels, i'ltroduced a strong tendency to respond "sam^" to consonants that 
actually were.^. different, regardless of * whether the two %vowel s were the same 
or not. ' * ; 

Discussion ' 

As far as RTs are concerned, this, experiment was not" particularly 
successful. -Inter- and intra-sub ject variability was too great and error, 
^^rate^ too high to lead to, useful results, apart from .the, marginally 
significant vowel effect in T^sk 3. However, if . the view is ''Accepted that 
the error rates convey very much the same information as the latiencies, the 
relatively greater consistency of the error pa4:terns permit one to draw 
conclusrons that originally were to be based on ihe RTs. ' It rJd be noted 
that thiese'conclusions apply only to relatively • inexperienc€i'. listeners.; so 
far, there is 'little evidence that practiced listeners are sensitive to the/ 
context following VCtargets in any systematic way. 6 ' ' 

The principal result is»the e\ffect of the relationship of . the final 
vowels on "same" judgments about --the siqp consonaritK . in pairs of' VCV 



^See Repp' (1976b) . I "also served as a subject in the present experiment and 
showed ~no^ systematic effects of context (at least, no effects consistent 
with tho'se shown by inexperienced listeners). - 
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uttfeffances. As in the 'earlieif studies by Repp (1976a, 1976b), the. effect 
. -jier sisted even at a re.latively long closure period. In addition, the'present 
experiment : phows that it is equally present in VC-:V stimuli which do not 
contain any. explosive transitions. This rules out the hypothe,f is that the 
explosive t-ransitions mediated the effect of the final vowel. 

It will be recalled thai I^isoni and Tash (1974) and Wood and Day ( 1975) 
demonstrated effects of the filnal vowel oh judgments about syl lable-init ial 
stop consonants (explos ive ti.ansitions ) . I hypothesized (Repp^ 1975) that 
this effect was due to the acoustic variation of explosive trarisitions with 
the following vowel, and I demonstrated that' the effect is also obtained in 
V(C)-CV utterances, where part^ of the' consonantal information (the implosive 
transitions) is indep-rndent of ithe final vowel. However, the explanation was 
always possible that the listeners simply igiTiored the implosive transit ions , 
; t we e n b a s i n g t he i r decisions on implosive or explosive 
or perceptually iritegrated these cues be/cause they signalled the/ 
same place of articulation. These interpretations no longer seem tenable./. 
In yc-'V stimuli, a final vowel containing no consonantal cues whatsoever 
biases judgments about eventsj that are acoustically independent of it and 
occur as much as 250 msec earlier. This effect is/ of the same magnitude as 
that, obtained ir V(C)-CV s t imu i i , wh ich suggests 1 tha^ it is of a .more general 
nature and does not depend on i the " ''connectedness" /of portions of the signal 
by phonetically compatible cues. leather, the perception of certain acoustic 
cues, seems to be sensitive to| any. speech information that follows withi^n a 
considerable time span. Of coirse, tHis time spai^' will depend on a number 
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increased cpns iderably , especially on VC-V "different" trials. In coiiparing 
VCV-VC pairs. (Figures 7a and j 7b) with VCV-VCvl pairs (Figures 6a and 6b), 
lexactly the sane manipulation adding a final vowel to the second stimulus- 
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the same had little effect at the\ longer closure ^ durat ion , 
krror rates at the shorter closure duration. This^^effect is 



were 



for although r.he ciifference between ,th^^ two sti^nyli /increased, 
the case of Vc4v stimuli), fewer errors wer.; 'c/aimitteTi' on 
The fact thfit thle overall structure oA the ttii- ;v;;..3.m«ili became 
^crre similar/'toay have been more important than the precise Relationship of 
the final vowels, although the latter, of course, had\ an additional effect. 
When the target consonant's were Id if ferent , adding a fihal CV .por/tion to the 
second stimulus increased errors slightly, while addii\g a final' vowel only 

magnitudes 
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of- these effects must have, been due to tjrie presence vs. absence of explosive 
transitions which conveyed re levant- consonantal information. It was mos.t 
surprising that tffK addition of a final vow^l. had such a large effect in VC-V 
stimuli at the 250\isec closure duration. Apart from this effect, the 
results may be cautiously interpreted to \show two factors at work: 
similarity of overall stimulus' structure , which played a role only at the- 
short closure duration ( ^juggest ing that, at the\ longer closure duration, the 
two portions of each stimulus no longer formed one perceptyi^l unit or chunk), 
and complete identity, which was effective at both closure durations. 

— w - Cc ns i de i- no w the effect of adding a ''-final vowel or CV portion to the 

first stimulus in a pair. Doing this to a VC~VC pair results in a. VCV-VC 
pair. The effect is an increase in errors on "same*' trials but not on 
'-'different" trials, except at the short closure duration for VC-V stimuli 
XFigure 7d). In each case, the manipulation eliminates the advantage of 
"same" trials , .which apparently requires that two identical stimulus portions 
follow directly upon each other. (Note that the advantage of "same" trials 
was found in VC-VCV pairs, where no .auditory information intervened between 
the tyo identical VC portions.) When a Vowel or • CV portion is added to the 
first' stimulus in a VO-VCV pair, a VCV-VCV pair results in which the overall 
stimulus structure of the two stimuli is equal. ^ On "same" trials, the error 
rates . for VC-VCV pairs, are more like those of VCV-VCV pairs with different 
vowels at the short closure duration, but like those of VCV-VCV pairs with 
identical' vowels at the long closure duration. This again ^uggests that the 
final vowel or CV portion formed a .perceptual unit with the VC portion -^t the 
shorter closure duration only. In each case, there is advantage for two 
identical perceptual units following directly upon each other, be they VCs or 
yCVs . The effect of adding a vowel to the first stimulus! on "different" 
trials is similar to the effect of adding a vowel to the second stimulus: a 
moderate increase in errors for V(C)-CV stimuli, and a large increase* for VC- 
V stimuli, regardless of closure duration. , The increase in errors at the 
short closure duration may also reflect a bias towards- "same^'-^r^esponse s 
arising frcim' similarity in overall structure. 

The data suggest, then, that the average unpracticed subject processes 
the stimuli as follows. All the information that occurs prior to the onset 
of the second VC stimulus is phonetically interpreted and stored. The 
information -beginning with the second VC is , first compared to the stored 
information in a holistic manner, ' In this holistic comparison, the size o_f 
the units compared " is determined by the toLai information held in storage , 
that is, if the first stimulus was a VCV (even with a closure period oi 250 
msec), the units to be compared will be VCVs , if it wa^. a ' VC , they will be 
VCs- (In ' the 1 a 1 1 e r ~c~a"s~e , if the 'second VC is followed by further 
ijnformation after a relatively short interval, the listener may have diffi- 
culty -in segregating the VC portion as a unit for comparison.) If the second 
unit exactly matches the first unit held in storage, an accurate (and fast), 
"same" response is issued. The low error rates for identical VC-V stimuli 
with a short closure period suggest that these matches take place at a 
prephonetic (auditory) level; otherwise , there should h^ve been more errors 
onS "same?" trials because of the high uncertainty about the phonetic ^identity 
of these st imul i ( c f .. Experiment I). If the holistic match is negative (or 
already while it is being performed), a more analytic comparison is conduct- 
ed j, \most 1 ike 1 y between phonemic stimulus representations. The final vowel 
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in VCV stimuli can be ignored in this comparison, but it may still have 
indirect effects on the identification of the .stop consonant . These- indirect 
7-e'f f e'C t s ^ma y" "a^courTE fo r part oT ^t h"e~ error piaT: t e r h ' »'S u c h as ~tR e s t"r iTc ing" 
difference- between VC~V/VC-V "same" (vowel different) trials and VC-V/VC 
"same'* trials at the short closure duration. 

Clearly, this is not *a full account of. what is going on in the 
listener's head in -this complex task. Various individual differences and 
strategies may be involved. ^ But the availability of a special prephonetic 
mode of comparison for th^ detection of exact identity seems fairly clear. 
The good agreement with the results of Repp ( 1976b) shoi»ld be noted, in terras 
of error rates, at least. There was no cons is tent -/e f fee t of closure duration 
on RTs when the second stimulus was of the V(C)-CV type. This also a^rr 
with the earlier results and ind icates ' that the subjects did not. make their 
decisions solely on the basis of the explosive transitions- Clearly, 
however, the explosive transitions were taken in^to account, as chown by the 
difference in "different" error rates between V(C)-CV .nd VC-V pairs. 
Fur Lher research yielding cleaner RT data will be needed to gain more* insight 
into the precise processing strategies employed by listeners in. this task. 

EXPERI^fENT III / ' 

This experiment investigated the perceptual interaction between implo- 
sive and explosive, transitions in VCV stimuli by a new method: systematic 
manipulation of the acoustic structure of the transitions. Consider a VCV 
utterance with a short closure duration (for example, 25 msec). The medial 
stop is almost always perce ivea accord ing to the explosive transitions, even 
if the implosive transitions are appropriate for a different place of 
articulation (cf. Dorman et al., 1975, and. the present Experiment I). In 
other words, both /ah-dz/ and /.ad-de/ are perceived as /ade /, and. both /ab- 
be/ and /ad-be/ are perceived as /abe/, if the closure period is. made 
sufficiently short. What happens if the explos ive • t rans i t ion s are chosen so 
th.r;- the second syllable is ambiguous between /be/ and /de/ when presented in 
isolation? Will it be equally ambiguous when preceded by /ah/ or /ad/ at a 
short closure duration? Or will the (unambiguous) implosive transitions 
determine the phonetic percept in this case? Their effect could be either 
assimilative or contrastive; because of the close contiguity of the interact- 
ing transitions, and since the implosive transitions are -not perceived as a 
separate phonemic event, an assimilative effect seems more likely. Such/an 
effect would provide evidence of perceptual ' integration , of implosive and 
explosive transitions, while absence of any effect would support a perceptual 
interruption hypothesis (Massaro, 1975) or at least suggest 'that implosive 
transitions play no perceptual role at very short closure durations. 

Consider now the reverse case. As the closure duration is increa^s \, a 
stimulus like /ab-de/ changes perceptually, from /adz/ to /ab-de/. At 
comparable closure durations, /ad-de/ remains /adz/ in perception; gemi.ate 
consonants (/ad-de/). are perceived only at much longer closure durations 
(Repp, 1976b). What happens if the implos ive transitions are made ambiguous 
between /ab/ .and ./ad/?' When followed by /de/ at an intermediate closure 
duration (115 msec, say), will the perceptual result be /ade/ ,or /ab-dc/? 
When followed by /be/, will it be /abe/ or /ad-be/? Again, the effect of the 
explo'sive transitions on the perception of the ambiguous implosive transi-. 



tions could be either assimilative or contrastive, or absent altogether.^ A 
-predic-tion- is much mor^ difficult to tnake_iri. this case . Again, an_a^simila-^ 
tive effect would provide evidence of perceptual integration over a period as 
long as the closure duration used. 

The method, used was to construct acoustic contiriua of implosive. transi*- 
tions (/ab/-/ad/) and explosive transitions (/be/-/de/) and to investigate 
shifts in the phoneme boundaries on these continua as a function of the 
phonetic idenuity of the preceding (following) transitio s. Four control 
conditions were included. In two of them, the VC and CV portions_were 
presented in isolation. In. the other two, the VC-CV . comb inat ion had a 
closure duration of 265 msec, so that the implosive transitions were always 
perceived as a separate phonemic events even when phonetically compatible 
with the explosive transitions. (The single-geminate boundary lies around 
213 msec — Repp, 1976b). If there is any percept ua 1 . int erac t ioti between 
implosive and explosive transitions over this long temporal distance, it is 
most likely contrastive. A rating scale was used to judge the st imuli , s ince 
it. was thought possible that the perceived clarity of a consonant might be 
affected by preceding (or following) compatible (or incompatible) transi- 
tions, independently of its perceived ident it y . 

Method 

Subjects . Ten new volunteer subjects participated. I also served as a 
subject, but my data were not combined with those of the other subjects. 

Stimuli. All stimuli were prepared on the Raskins Laboratories parallel, 
formant synthesizer. Two stimulus continua were constructed: a VC continuum 
of seven syllables ranging perceptually from /ah/ to /ad/, and . a CV continuum 
of seven syllables ranging perceptually from /be/ to /de / . The stimuli 
within each continuum differed only in the offset (onset) frequencies and 
trajectories cf the second- and third-formant transitions, spaced in equal- 
steps between the two endpoint stimuli. The stimuli were selected so that 
the phoneme boundary would fall approximately in the center of each continu- 
um. The VC stimuli were 185 msec • long , with 35-msec transitions; the CV 
stimuli were 300 msec long, with 50-msec transitions (as in the previous 
experiments). The VC stimuli all had . the same F]^ transition as the /ah/ 
stimuli in Experiment .1 and earlier experiments (unlike the stimuli in 
Experiment II). 

Two stimulus tapes were prepared. The CV tape first contained a random 
series of 75' CV syllableis consisting of the seven CV stimuli with the 
following frequency distribution: 5 times (1,2,3,3,3,2,1). This distribu- 
tion of stimuli was used to provide -more reliable information in the region 
of the phoneme .^boundary and was mai\ntained in all other conditions . The 
initial CV series was followed by a sjeries of 150 stimuli consisting of the 
same CVs preceded by either /ab/ or /ad/, the two endpoint stimuli of the VC 
continuum. The closure interval was 25 msec. Another analogous series, of 
150 stimuli followed, with a closure period of 265 ms.ec . These sequences 
^.were arrange.d in successive blocks of 30 stimuli, each containing one cycle 
of all stimulus combinations, with the basic stimulus- frequency distribution 
described above. 



The VC tape .was exactly ar logous. An initial 75-item VC serie^ was 
fbllowed by two i50~iteTTi VC~CV se^.w s in which each VC ccimulus was followed 
by either /.be/ ''or /d*^:/, the <two endpoinf stimuli of the- CV continuum. The 
closure period was 115 ^nsec in th.. firsf* series and 265 msec in the second. 
The CV and VC capes ^ad identical stiuiulns randomizations, with reversed 
roles of the VC and CV portions. „ 

Procedure . Ul subjeccs receiyed the c editions ^n the same order: 
first the CV tcv=, then the VC tape, and the st imulus sequences in the order 
described ab^>ve. In the initial CV. seritrs, the subjects were instructed to 
rate each ccnsonaiit on a tcale .ran[>Lng from 1 t. o 6, where 1 represented a 
"very— crt'ear B", 3 "ambiguous , more l^Ke a B", 4 "ambiguous, more like a D" , 
and 6 a "very "zlear D" . Subjects were urged to. use the extreme ratings at 
least occasionally, that is, to Kake their judgments accord in;^ to ' the 
relative goodness of the stimuli and not according to how they cor red with 
real speech. The subjects were exposed to a portion of the stimulus series 
before actually beginning the cask. Iri the following conditions,- the 
subjects were asked to maintain the criteria established . during the initial 
series, that is, to give generally poorer ratings if all stimuli sounded 
poorer and generally better ratings if all stimuli sounded better. For the 
'25~msec CV condition, the subjects were merely told that each CV syllable 
would be preceded by the vowel /a/; nothing was mentioned about the implosive 
transitions. For the 265~msec CV condition, the subjects were told that each 
CV- syllable would be preceded by either ./ab/ • or /ad/. These initial 
syllables were to be ignored, and only the relative category goodness of the 
initial consonant of. the second syllable was to be . evaluated 

In the VC conditions, the subjects first rated the syllable-final 
consonants on the same six-point scale. Then, ,in the 115-msec condition^ a 
different response mode was ' introduced because of the perceptual 
<he terogeneity. of the stimuli (either one or. two intervocalic consonants). 
.Instead of using the rating scale, the subjects wrote down "1" when they 
heard a single • consonant ( /abe / or /adz/) 'and "2" when they heard two 
different consonants ( /ab-d e/ or /ad-be/)., Finally, in the 265-msec 
condition, the rating scale was used again to evaluate the f irst . ( syl lable'- 

final) ccriTonant, ignoring the /be/ or /de/ that followed^ 

' ' ■ ' ' ■ . f' 

The equipment was the same as in previous . experiments . All condiotions 
were administered in a single session qf about one hour. 

Re sul ts . . 

The results of the CV conditions are shown in Figures 8a and 8b.. :,The, 
dashed lines riepresent the ratings for CV stimuli in isolation. The other 
two functions in each panel of Figure 8 represent responses to CV syllables 
preceded by /ab/ and /ad/, respectively. 

It is obvious that the VC precursors had an • effect in the 25-msec . 
condition biit not in the 265-msec condition. The former effect was assimila- 
tive, as expected, and remarkably cops is tent from "subject to subject, as. 
reflected in its high significance (Fj 9 = 45.87, p « .01). The signifi- 
cance test was performed on the difference between . the effects of the two 
precursors on the ratinf^s; the control data (CVs iri isolation) were not 

../'■' 
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Figure 8: Mean "D-ness" ratings of CV stirauli from the /be/-/de/ continuum 
in isolation and when preceded by'either of two VC precursors, at 
two closure intervals.. (Tlie dashed functions in the two panels 
are' based on the same data.)' 



included in this analysis. The precursor ..effect did not interact with 
position on the stimulus continuum; it can be seen in Figure 8a that it was 
equally present for each of the seven CV stimuli. The rating functions for 
the VC-CV stimuii were not only flatter than the function for CV stimuli in 
isolation, but they also reached an ear lier asymptote at' one end.. When the 
same results were plotted in terms of -the percentage of "D" responses ( tTiat . 
is, the percentage of ratings falling between 4_ and 6), the pattern was 
identical. This suggests that the implosive'^transitions simply "got through" 
on a percentage of trials . This was not Entirely unexpected; at. a closure 
duration of' 25 msec, the perceptual dominance of the explosive transitions is 
riot perfect (Dorraan et al., 1975, and the present Experiment I), ^ne subject 
actually heard /ade/ whenever the VC portion was /ad/. Another subject 
reported hearing /ab-de/ on a number of trials. 

Ttius,"the question arises whether the effect of the VC precursor was 
all-or-none or gradual in nature. Did it consistently bias the perception of 
the CV portion, or did it just i itrnde on a certain small number of trials 
and have, no effect on all others.'' ' One way of answering this question is to 
make the average ratings conditional on whether they fell between 1 and 3 (B) 
or between 4 and 6 (t)) . These conditional ratir.gs for the 25-msec condition 
are shown in Figure 9. Only data points with at least 10 responses in the 
relevant category are shown. The entries represent means calculated over all 
individual responses of all subjects, that is, different subjects contributed 
different numbers of responses, and therefore no statistical analysis 'could 
be conducted. It is evident from Figure 9, that the precursor effect was 
redifced in ter:ms of conditional ratings, but a smaller effect in the 
predicted direction clearly remained. 'In other words, ../be/ preceded by /ad/ 
was indeed perceived' as a "poorer B" than /be/ preceded by /ab/ .or by 
silence, and '/de /. preceded by /ab/. was perceived a^s a "poorer' D" than /de / 
preceded by /ad/ or by silence. We may conclude, then, that the VC precursor 
e^'xerted a genuine bia'Bing effect on the perception, of the explosive transi- 
tions on most or all trials. Note, however, that preceding an unambiguous CV 
with a phonetically compatible VC ^precursor did not improve its ratings 
compared to the same CV syllable in isolation; ' thus , there was no positive 
contribution of the implosive transitions to the perceived clarity of the 
.consonant. * ^ 

It is curious that I was the only listener who showed ca precursor effect 
in the opposite direction, .that is, a contrast effect, although I never 
perceived more than a single consonant in the 25-msec condition . It is not 
clear why my extensive experience with the stimuli should have led to this 
surprising reversal.. 

The obvious ibsence of any average precursor effect -in the 265-msec CV 
coriiiition (Figure 8b) may n.^t be representative of individual listeners. Of . 
the ten subjects, two showed assimilation effects, five showed contrast 
effects, and the remaining three showed irregular effects or none at' all. I 
showed an assimilation effect. Thus, although some of these effects may just 
represent random variation, it seems that the VC precursor did affect the 
perception of the CV syllables, but in different directions for different 
listeners. At present, the basis of the« individual differences is . obscure . 
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The reaWl'ts-6-f— the-VC ra.ting conditions are shown in.Figure iO. Figure. 
VOa -shows"' the . 11 5-- msec condition. Her^, the^ responses indicated the number 
of 'consonants heard.- The'ordinate is " labe led percent "D" responses , 'whichi. 
means thfe percentage of ratings between 4 . and 6 for. VC . syllables in 
isolation, the percentage of '*2" responses for stimuli "followed, by /be/, and 
the percentage of responses for stimuli .fo4r'tewed by /de/. Plotted in 

this" way , it is evident that . the two CV "postcursors" had little differential 
effect. Again, however, individual results varied widely ~~ more so than 
one-., would expect from mere random variability. Four • subjects were more 
likely- to hear one consonant with the /be/ postcursor than they were to hear 
two consonants with the /de/> postcursor, one subject showed the opposite* 
effect, and 

regions of thie* VC continuum. Such 



the remaining subjects showed different effects in different" 



an interaction is weakly evident also in 
Figure 10a: at the /ab/r-erid of 'iJhe VC continuum-, the /be /-function lies 
above the /de /-func t ion , and this rellat ionship is reversed as the /ad/-ei'id of 
..the VC . continuum is approached. Selven out of ten. subjects showed results at 
least partially compatible with thislpattern, which, however, is not readily 
interpretable and was not statistically significant. 

\. ' . . ■ ' , "* ' 

Much more consistent than the differences between the two po.stcursprs 
was the difference between the postcursor functions and the funct ion \fo.,r VC 
syllables in isolat ion °( Fj 9 = 5.9, p < .05, for the main effect; f4 35 ^ 
13*3, p < .01, for the ;Lnteraction .with position on the continuum)./ The 
difference can be broken down into two components: lower asymptotes of this 
postcursor functions (at least at the /ah/ end of the VC continuum), and a 
•general shift . in the VC category- bourldary towardjs the /ad/ end when a 
,—po-st^u-r-ser^ followed : No matter which CV syllable fol lowed , the VC ppi^tion 
was- more likely to ^be^ perceived as /ah/ 'than* in isolation. The reason for 
the first compohent was probably general uncertainty due to the relativ<| 
difficulty of the task. The reason for the second component is not clear, 
except that it is reminiscent of the general difficulties subjects had in 
perceiving Vad/ correctly/ in earlier experiments. 

I again produced. a curious result in- the 115-msec condition: I needed a 
while to hear any inst-ances of two consonants at all, which made my data 
quj-te useless. (The same happened in a repl icat ian. oi.th.e., experiment . ) Warm- 
up effects- of this sort may have played a role with some of the other 
subjects, too, although they seemed to have much less trouble. , 

Finally, the results of the 265-msec VC condition need to be . d iscussed .. 
They are shown in Figure 10b. * (The data of one subject had to be^ excluded in 
this condition because he apparently responded to the CV portions of the 
stimuli. It can , be seen ' that there was a small postcursor e f fee t .. in the 
predicted direction, that is, a contrast effiect. . Slight contrast je^ffects 
Were shown by five subjects- and myself; the remaining subjects showed no 
systematic effects., No listener showed any assimilation effect in this 
condition^. Due to this relative consistency between subjects , the postcursor 



^In the statistical analysis, the seven positions were reduced to. five by 
combining the two positions at each end of .th^ continuum, so that an ^efqual 
number of observat ions was available at each of the resulting five posi- 
t ions . 
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Figure 10: Percentage of T' responses (panel 'a) and mean "D-ness" ratings (panel b) for VC 
; sjimuli from the /ab/-/ad/ continuum in isolation and 'when followed by, either pf' 
•■ , . twip CV postcurs.ors, at two closurei i intervals. (The dashed' functions in the two 
panels are based , on the same, data but plotted differently.),. Percentages of T/ 
reiponses in panel, a are inferred from /'l'' and "2" responses and ratings given in 
' ' ■ the task (see text). . . ' , 



effect reached significance (Fj g = 7.97, p< .05), although most individual/ 
effects were smaller than those'in the 265-msec CV c ond it ion . , The effect did 
not interact significantly with position, on the VD continuum. Again,-, there 
was no evidence of any increase in the perceptual clarity of an unambiguous. 
VC syllable When followed by a CV syllable. ' 

Discussion . ' 

■ ~ ^ 

These results aemonstrate that implosive and explosive transitions are 
■lot perceived independently of "each other. At short closure .durat ions , the 
implosive t-T-ansit ions ate perceptual ly dominated by. the explos ive- trans itions 
and only a singl? consonant is heard. Nevertheless, the implosive transi- 
tions bias the perception of the explosive transitions. The conditional 
ratings (Figure/ 9) indicate that this is, at least in part, a genuine 
perceptual bias due to perceptual integration of auditory or phonetic 
information. Part of the effect may also be due to occasional perceptual 
dominance of implosive over explosive transitions. Whether the perceptual 
integration occurs at an auditory 'or at\.a phonetic level, is jiot-^rear at 
present. This issue could .be fur ther ■ investigated varying the acoust ic 
structure of the VC precursor within phone t ic^c^tei^^ofies . 



When the closure period is lengthened, the implosive transitions emerge 
as a separate phonemic' percept if they are incompatible with the explosive 
transitions. As the results of the 115-msec condition show, the nature of 
this percept is not consistently influenced by the identity of the postcur- 
sor. However, the mere presence of a CV postcursor biased the perception of 
the VC portion towards labials.. This effect can, no longer be, due to 
straightforward perceptual integration, but' it probably represents, some more 
general . perceptual, interaction as ,ex:empl if ied •aliso- in Experiment I, 'Task 1 
.(VC-V stimuli). In terms of Massaro's (1975) theory, the results may be 
interpreted to indicate that /ad/ required more processing time than /ab/, so 
that a following event interfered more . with -the former than with the lapter. 

When separated by a closure period, of 265 msec, the perception of 
implosive and explosive transitions is largely indeRendent , but there is a , 
tendency towards small contrastive 'effects that, surprisingly, are more' 
consistent in the backward direction than in the forward direction. This may 
r-'flect the lower perceptual salience of implosive transitions., Although the 
present VC stimuli were as consistently ident i f ied ' as .the CV stimuli in 
isolation, the ir ^suscept ib il ity to contextual factors seemed to be gr.e.ater, 
perhaps due to the absence of a "protective" continuation of the signal/ (such 
.as a release burst .might prbvide it)-. The contrastive^ postcursor effe<;ts are 
evidence that, at least occasionally,, phonetic decisions about the jiiplosive 
transitions are postponed until. eyents oc,curring as much as 265 msec later 
have been phonetically inferisreted O.f. course, it may be the nortrial mode of 
processing speech to' .phonetically recode chunks of -VCV size or larger. This 
agrees well with the results of Experiment II and -the earlier RJ' studies. 
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ABSTRACT 



The 
L'tbern;an| 



results of a recent 
Fowler, I and Fischer, 
readers fare more'' affected tl 
characte/iri sties of Visually pr 
good reakJers made significantly 



srudy (Lil ;1rman , ^I., Sha^kweil( 

suggest that good beginnii^g 
r readers by the j phonetiV 
items in a locall tpsk. Th^ 



recall errors on strings 



is: 



01 

in \ 



lettersi Vith "rhyining letter names\ thc-in on nonrhyming seqi^ence 
conty^i'c .>-e poor readers made roughly equal numbers 'of ^errors on 
the r e and nonrhyming l«tt^r strings. The purpose of the 
presr-. • . .dy wa"^ to determine ^hether the interaction between 
rsadiri'- a^ vlity and phonetic similarity may be solply determined by 
different'; rehearsal strategies ofl the two g roups . „ Accordingly , 
good ahd |poor readers were tested bn rhyming. and, nonrhyming words 
using k recognition memory paradigm| that minimized the opportunity 
for r^hea'rsal. Performance of the l good readers was morj.'e affected 
by phf)net:ic similarity than was Chat of the poor readers, in 
agreement |with the earlier study. The present, findings support the 
hypot^iesi^ that good and poor readers do differ in their ability to 
cces<5 a phonetic representation. \ , 



INTRODUCTION 



Many I invest igators see the root cause ox reaaing 
children las a| deficit v perceptual learning (for example. Bender, 1957; 
,Fro3tig, 1963;: Silver and Hagin, 1960)1. Their research has emphasized , the 
impo.rtancje of ;visual processes such asWhose involved in the identification 
of l,et'ter| shapes and the scanning of t-ext . However, critical surveys of such 
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res.earch (Benton, 1962, 1975; Hammill, 1 972 ; Ve rnon , 196*0) produced little 
hard evidence to\' support "^e hypothesis that visual and directional factors 
figure heavily in most cases of reading disability. This conclusion was 
reaf finned by the ^work of Shankweil^^r and Libex.nan (1972), Vellutino, Steger, 
and Kandel ( 1972)\ Vellutino, Pru;'-"»k, Steger, and Meshoulam (1973), and 
Vellutino," Steger , Harding, and Phillips ( 1975). 

in view of the repeated faiU're to establish visual-pt^rceptuSl deficits 
as a major problem in learning to read, several investigators have begun to 
examine other cognitive. prerequisites for. reading acquisition, in particular, 
those relating to the child*s primary 1 anguage ab j i i t ie s . These investiga- 
tions (for example , Bloomfield , 1942; I . Liberman , 1 971 , 1973; Mattingly, 
1972;. Rozin and. Gleitman , 1977, Shankweiler and Liberman, 1976) have suggest- 
etd that reading should not bi3 viewed as an independent ability, but as 
parasitic upon the spoken^ ian^.vage . If reading is a derivative of speech and 
acquired by the child only cifter he has acquired speech, it is reasonable to 
consider how learning to recC may build upon the earlier language acquisi- 
tions of the young chilli. 

Although both good ai. poor readers speak and understand the language, 
it may be that poor rear^ers have deficiencies in certain subtle aspects of 
language development that are not evident even to trained observers'. The 
present research examines "this po^s ibil-ixy . Specifically, its purpose is to. 
explore the role of phonetic recoding in reading acquisition and to investi- 
gate the hypothesis that good and poor beginning readers differ in their 
ability to access and to use a phonet ic re pre sent at: ion . 
., . ■ 

A'notable characteristic of language is that the meaning of the longer- 
segments (for exaraplu, sentences) transcends the meaning of the shorter 
segments (for example, words); it follows that a listener would have to 
maintain the smaller units. in some temporary* store, until a sufficient number 
of Clem have accrued — to enable him to appreh,end the meaning. ' It has been 
argued (A. Liberman, Mattingly, and Turvey, 1972) that a phonetic represen-- 
tation is used for this purpose and that it is uniquely suited to the short- 
term storage requirements of language. Our own research has emphasized two 
additional functions of the phonetic representation of spoken languages- 
(Shankweiler gnd Liberman, 1976; I. Liberman, Shankweiler, Liberman, Fowler, 
and Fischer, 1977). We have speculated that a language* user may employ a 
phoni^ • J representation in order to access h^s mental lexicon and to 
recoi vuct Lhe prosodic information thst is crucial to understanding speech. 
We hdve also suggested that readers of a language^ may continue to u-se a 
phonetic representation, just as hearers do rather /than develop a new mode 
of processing for the wr it ten - language . ' , 

There is considerable experimental evidence to support the view thctc 
people do employ a phonetic code to store visually presented letters or 
words, even un.der circumstances where it ^is disadvantageous to -do so (for 
example, Conrad, 1964, 1972; Baddeley, 1966, 1968, 1970; Hint zman ,- 1967 ; 
Kintsch and Buschke, 1969). Typical studies presented subjects with letter 
or word' sequences to be read s ilent 1 y . and then recalled. The investigators 
usually reported that most confusion errors were based on the sound of the 
letter or word, rather than on its visual' appearance. 
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In addition to these considerations, there is reason to believe that 
-phonetic recoding is of special significance for the beginning reader who is 
learning how the alphabet works. Consider the relationship between^ the 
alphabet and the spoken language. English, unlike the logographic writing 
system oi Chinese and the Japanese Kanji, uses a symbol system , the alphabet, 
that is keyed largely to the sound structure of the language. If the child 
has learned something about how the spelling reflects the sound structure, he 
will be able to offer at least an approximate pronunciation of new words. 
However, to take full advantage of the bene fits . inherent;, in the symbol 
=economy.of an alphabet, the reader must be able to employ an analytic 
strategy, grouping the letter segments into articylatory units and mapping 
them into speech, rather than treating words as irreducible wholes 
(Shankweiler and Liberman, 1976; Liberman et al . , 1977). 

However, in order to use an analytic strategy, the reader must recogn^ize 
thaf the alphabet is largely a' direct representation of the phonemes in 
speech' Whereas the recognition of two spoken utterances like bet and best 
as different words, is- sufficient for the comprehension of these as lexical 
items, the process of ma.ppfng the' written word onto its spoken counterpart 
requires, in addition, recognition of the number and identity of the phonemes 
contained in' the spoken wo d . There is now considerable evidence to suggest 
that the ability to recognize phoneme segments in speech ic ^ predictor of 
success in learning tc re^d (Savin, 1972; Helfgott, 1976; Lioerman et al . 
1977; Zifcak^). . | 

In viev: of the evid^nce.^ that poor readers have difficu'lty in perforrr 
phonpme segmentation tasks, it is appropriate to ask whether poo- readers a.*^ 
also deficient in the ab/ility to construct and employ t^hotj^t vc . t t>pre£ien,ta • 
tion rnnrPivphly, poor readers m ight at tempt to rei ai i script as snapes, 
rather than as phonetic entities. Using a recail-memory task, our research 
group has found evidence to suggest that good_ and joor^reade^ do, di^f^v in 
their phonetic coding ability (L:.bennan et al., 1977). In th&t stui/, good 
and poor second grade readers-^^zere presented with ,^eqi-^nce.v of lectav^ or 
recall. Half of the sequences were composed of rL^yma.ng-ccuJonant s f: v:ro.- cb.c 
set B C r G P T V Z), the remainder oT tionrhyrning-^^^^^ ^xrom^^ the ^Cv. 

H K L q R S W Y). Each of the strings of five' L-pp\-r-cas"e let er^: was 
displayed tachistoscopical ly for three seconds. Tlie subjects, were instruc.ed 
to print as many of the letters 'as TFrr coul^-^^ 

after presentation or a f ter 15-see di-.laly:::::-^^^^^^ sco.ea both 

with and withou t rejard to se.ria^l pos it ion . 

Under both recall conditions, the good reade^-s->^isf K\ * ~r ' ^j^f if an^^^^^ 
mo-e phonetic interference than t n poor readers: measured by^tbe 

differences in total errors beLween ::he hyming and nonrhyming sequfjpces.. 
Because nf this interaction between reading ability and plonei-ic similarity, 
the dif Lerence • :n performance between, good and poor readers can-.ot b^ 
explained by supposing that the two reading groups differ in ^\general memory 
capacity," 'Die differences also cannot be attributed, to a i^er ial-ord^- ring 



^M.' Zifcak, ^^on. logical awareness and reading acquisition" Ln first /^raJt- 
children. ■ iLipublishe^ doctoral dissertation. University of Connecti^.L . (in 
preparation). 



probleini in the p6or readers, since the ^effects were significant even when 
recall was scored without regard to serial position. 

It appeared ,. then , that the phonetic ch-aracteristics of the letter names 
had a differential effect on recall in good and poor readers. Fron; this, it 
was assumed that the good readers are better . able to access and use a 
pho-netic representation in short-term memory than the poor readers. An 
aRernntive ..interpretation, however, would ascribe these findings to differ- 
en es in rehearsal strategy for the two reading groups. 2 If th*:; poor readers 
were able to rehetarpe ^ fewer letters 'uon the pood readers before recall 
began, the rhyming letters would havt* less oppo.-unity to interfere. This 
might give rise to the pattern of results obtained: inferior recall of the 
nonrhyming items by the poor readers, but little difference between the 
groups on the rhyming letters. 

The present experiment was undertaken primarily in an effort to- resolve 
this u.nbiguity. A paradigm originally devised by Hyde and Jenkins X1969) for 
a different purpose was adapted for this study, because it permits us to test 
memory in a way that minimizes the opportunity for rehearsal. The procedure 
involves a test list of words followed by a recognit ion 1 ist . Ttie subjects 
are no t informed at the time of the presentation of the first. list that, a 
subsequent test of recognition memory will 'follow. Thus, the task appeared 
to the child'merely as a reading tat/.. If differential rehearsal rates were 
responsible for the earlier results," then differences in phonetic similarity 
should disappear, with this new procedure. However, should the findings of 
the present study vvplicaVe those obtained in the previous research, there 
would be support .fgr the interpretation that the poor readers have a deficit 
in accessing or using a phonetic representation derived from script. 

A second reason for undertaking, the present study was to test the 
phonetic coding ability of the two groups of readers in a task more, nearly 
resembling a real,i3tic reading situation. This was accomplished by using 
words, rather than letter strings, as the, stimulus items. 

METFOD 

Sub jec t s 

The subjects were s'econd grade school . children ■ in the --^'Mans f-ie Id , 
Connecticut public school system. Children were selected for pretesting on 
the basis, of their total reading grade on the Stanford -Achievement Tesi, 
(SAT), that had been administered by the schools during the fourth month of 
the. school year. In this preliminary screening, children with total reading 
•gr£.dcrf between 3.5 and. 5.0 on the SAT. were candidates for the good reading 
group, vhiie those with reading scores between 1.5 and 2.4 were considered 
for rhe poor reading group. ' Final selection of the two -reading groups from 
among these childreii was inffde in the seventh month of the school year by 
a<.r:iLnistering th^ word recognit ion subtest of the Wide Range Achievemen t Test 
C-'RaT) (Ja^tsk, Bijou, and Jastak, 1965 ). The criterion for inclusionTn the 
gpjd reading groi'p was d ;WRAT grade level between 3.1 and 5.0. ' A child 'vas 
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selected for the poor reading, group if his WRAT graav level was in the range 
of 1.5:3 2.4. 

Thirty-seven children (19 good readers and 1^5 poor readers) met the WRAT 
criteria tor participation in the experiment, ^oven subjects (four good and 
three poor readers) had to be dropped L icause • thc=iir data were incomplete due 
to an experimenter error. Another poor reader had to be excused from the 
'experiment because he was unable to read more than 50 percent of the words, on 
the recognition list (see Scoring Method ) , Thus, the data analysis was based 
on the performarice'.of^:15 good readers with a mean. WRAT grade" level of 3.97 
(rang^: 3.1 to 4.5> and 14-pbor readers with a mean WRAT grade level of 2.19 
(range: 1.5 to 2.4). 

The good readers had a mean age of 92.4 months, while the mean age of 
the poor readers was 94.0 months [t(27) = .97, p < .40J . The relative 
riitelligence (IQ) of the two reading groups was assessed by the Wechsler 
Intelligence Scale for Children, Revised Edition (Wechsler, 1974). The good 
readers had a mean Full Scale IQ of 11 4. 2 (V erbal Scale IQ = 113.1, 
Performance Scale IQ =112.5). The Full Scale, Verbal, and Performance IQ 
means for the poor renders were 109.0, 106.4, and 110.9 respectively. .The 
intelligence score^/V^i the two reading groups did not d if f er ' s igni f icant ly cn 
any of the three scales: Full Scale, t(27) = 1.05, p< .40; Verbal, t(2f7) - 
1.52, p < -20; Performance, t(27) = .29., p < .80. ' 

Word Lists 

The word lists consisted .of monosyllables chosen from Part One of the 
Cheek Master Word .List (Cheek, 1974). the worSs (see Table 1) were limited 
t.6 the. first grade level (1.0 - 2.0) in order to ensure that the poor readers 
could. read the bulk of the words presented , despite their reading handicap. 

The initial list- was composed of 28 words. The recognition list 
included the 28 wo7d7"on the initial list and. an equal number of words, the 
foils, not nresent on that list. Fourteen of the foils were phonetically 
paired with - a word on the* init ial list. These are the phonetically similar 
(th^t is, rhyming) iten:s . Word pairs were classified as phonetically similar 
if they met both of the following criteria : (1) they must share the sa..no 
v:wel sound; (2) they can differ by ' no more than three consonantal phonetic 
features in the set of "place'', '^manner" , '^voicing" and "nasal ity»» (Wickel- 
greii, 1966). If a set of two words failed to meet f either or jc:.\ 
requirements, they were considered to be phonetically dissimilar. . / 

The phonetically similar foil s addirional ly , had to meet the reguire- 
ment that they be ar different as possible m visual configuration from all 
vords on the initi'l list (for example, my-high, know-go).. The decision to 
make th is: r'eq .sirement was motivated by the possibility that . some subjects 
might be responding primarily to the visual appearance of the word , thereby 
potentially confounding the results. The remaining 14 foils were both 
phonetically and vis^jally diss irailar to' words on the recognit ion 1 i^r . 

Given the r - ^traint o'^f having to select • words from a first grade 
reading, list, it was impossible to maintain strict criteria for visual 
dissimilc r * t:y . However, it was import^int to have some measure of the 



TABLE 1: List of Phonetically Similar Word Pairs and Phonetically 
Dissimilar Words . 



Phonetically Similar Word Pairs 



Old ■; - ' Foil 



know 




go 


my 




buy 


cry 




high 


good 




could 


they 




way 


but 




what 


gum ' 




come 


shoe 




two 


new 




do 


bird 




word 


your 




for 


said 




red 


run ^ ■ 




done 


door 




more 



Dissijii-i.Iar Words 



Old . .. Foil 



.year 


best 


life 


guess 


each 




^walk 


ride 


help '■■ 


our 


keep 


did,. 


not 


cakie 


see 


duck 


friend 


oh 


.up 


off 


jump 


box 


told 


bring 


yes* 


face 


gave 


brown 



I/, 
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relative visual similarity of the two foil t^pes. to words on the initial 
list, so- that possible visual coding strategies would not confound. . the 
results. Accordingly, several •informal criteria of visual s imilarity were 
followed: (1) the two words had the same number of letters; (2) the initial 
letters in the words, were the same; (3) the. initial letters in the words were 
of the same shape (see below); (4) the final letters in the words were the 
s ame shape . , 



In the following chart,' the lower-case letters are grouped into four 
categories reflecting "similar shape" according to a scheme devised by the 

authors. » 

Lower Case Letter Shapes , 

a . short curved - c o e a s m n r u 

' b. short straight - v w x z i , ' 

c. tall above line - h d b f 1 t k 

d. tall' below line - p q g j y 

A visual-similarity matrix was constructed to compare each foil word 
with each word from the initial list. The numbers entered in a particular 
cell indicated the dimensions of visual similarity shared by a particular 
word-pair. The re.iacive visual similarity of the two foil types to the words 
on the initial list was computed by taking the total "number of times each of 
the four criteria was ' sat is f ied for each foil; thus , four totals were 
obtained for each foil word. Sep.arate t-tests wf;re performed on the four 
visual similarity measures derived for the two types of foils. No t-test was 
significant beyond the .05 level. This suggests that the two sets of foils 
were roughly comparable in visual similarity to words on tKa initial list. _ 

. Some word.s had mOre than one rhyming counteit^art (for example, my high, 
cry-buy). As a result,! some foils were phonetically similar with a second 
•'ord on the iaitial list. This somewhat undesirable situation arose with the 
.need to increase, the size of the vord list, which was constrained, by the. 
limits of a first grade reading lis-. 

Words with phonetically similar foils were equally distributed in each 
half of the initial list. Each half of the recognition li:.t contained an 
equal number of words from ■ four sets: phonetically-similar old words, 
'phonetically-dissimilar old v.xords , phoHeticslly-similar foils, and phoneti- 
cally-dissimilar foils. In addition, half of the rhyming foils preceded 
the^ir rhyming counterparts from the initial list, while the remaining foils 
appeared after their counterparts from the initial list. 

The^words were hand-printed in lower c:ise on white, three-hy-.f iv^r cards, 
using a black, fe 1 t-t ipped - pen . The . shore letters were 1/4 inch 'nigh, the 
pall letter.s 1/2 inch high. - . -/■ . 

ProcedLire ' . 



The children were assigned at randoq to one of two examiners who- tested 
them individually. 



Init iai 1 i s t « At" the start of the experiment , the child was told that 
some words were going to be shown to him one at a t ime . He was instructed to 
read each word aloud and then to wait unLii the next word was shown. Each 
wbrd was shown for as long as it took the Child to pronounce it. If the 
"child read the worJ incorrectly, the experimenter indicated this on the 
scoring sheet; no attempt was. made to correct the child. However, if the 
child corrected himself spontaneously, the word was scored as having been 
read correctly. . . 

Recognition lis t , After co'mplet ing the initial list, the child was 
infonred . that he was going to be shown a second list of words ,j one at a t ime . 
(Nc mention of this had been made previously.) His task was to read each word 
aloud and then to say "yes" if he believed the word was on the old list or 
"no" i f he ,bel ieved it was not. The experimenter recorded both the child's 
recognition response <"yes" or "no") and whether the child read the word 
urrectly. Before presentation of the recognition list, the examiners 
verified the chy.ld's comprehension of the instructions. 

Scoring Method 



Reading eirrors . Any word that, was misread on, either list was excluded 
from analysis 6f that child's recognition judgments. If the child inisread-a- 
word on the inii^ial list that rhymed with a foil on the recognition list, the 
recognition response, to the phonetically similar foil was also discarded, 
except in cases/where the foil rhymed with another word on the' initial list 
C sep p rpvious^ec t ion) . These exc lus ions were necessary in order to ensure 
L/iit error^ in recognit ion judgment s could be attributed with confidence to 
phonetic similarity with a word on the initial list. Any child who misread 
mbre than 50 percent of the words on the recognition list was dropped from 
toe experiment. 

Recognition judgments. A child's recogniti'^n performance on each of the 
four wdrd sets was expressed as ratio of the number of recognition errors 
to the total number of words i^ead correctly in each set. . 

RESULTS 

If the findings of Liberman at al. (1977) can be taken to reflect 
differences between superior and poor readers in phonetic recoding, then we 
may expect the following^ results in the present study; the good readers 
should maice significantly more recognition errors on the rhjoning foils than 
on che nonrhyming foils; the poor readers, on the other hand, should generate 
approximately equal frequencies of errors on the two types of foils. If, 
^however, b.oth reading groups make equal numbers of errors on each foil type,- 
then we may suppose that opportunity for rehearsal, which was a feature of 
the previous^ investigation but not of the present one, may have accounted for 
the- interaction, between reading ability and phonetic similariry reported 
earlier. ^ ^ 

/ ■ . " . . . : o 

Recognit ion . Judgment s ■ 

Two types of recognit ion errors will be considered. Of primary interest 
are. the "false positive" errors: the child reports a word as having occurred 
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RHYMING NON-RHYMING 



FOIL -TYPE 



false positive recognition errors as a function of reading 
and foil-type. , . 
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Figure 1* Percent- 
ability 



on the initial list wheEi, in fact, it was a "new*' word . The "false negative" 
error, which occurs when the child fails to recognizean "old word" as having 
ap[?eared on the ihitial list, will also be conpiHerpH. 

False positive errors ,^ The mean percentages of recognition errors for 
thi two types of foils (rhyming and nonrhyming) were computed. For the good 
readers, the error rate was , strikingly higher on the rhyming foils. (20.4 
percent) than on the nonrhyming foils (4.8 percent). In contrast, the poor- 
readers showed little difference between the percentage of "f'alse positive" 
errors made on the rhyming foils (16.0 percent) and the nonrhyming foils 
(12.4 percent )., Because of the apparent heterogeneity of variance shown by 
the good readers on the nonrhyming foils relative to rhyming foils, a 
nonparametric statistic, the Mann-Whitney U-Test (Mann and Whitney, 1947) was 
used to assess'the significance of the phonetic characteristics of the foils. 
For the gdod readers, the mean difference beween the mean recognition errors 
on the two foil categories was highly significant iU('5,15) = 26; p < .002J, 
whereas for the poor readers the e ror difference between rhyming and 
nonrhyming foils was not significant [U(14,14) = 80; £ .> .lOJ. 

The interaction between reading ability and foil-type (Figure 1) was 
..examined by comparing the difference between the error scores on the rhyming 
and nonrhyming foils for the two reading groups. The mean er ror d i f f erence 
was 15.5 percent for the good readers and 3.5 percent for the poor readers 
tU(15,14) = 23.5;p < .002]. These data_^ strongly support the interpretat ior 
of the interaction between reading ability and responses to phonetic similar- 
ity that was offered by Libferman et al. (1977). 

FaTse negative errors.. It is somewhat misleading to make a simple 
division of the old words into those with rhyming foils and those without a 
rhyming foi'f. On the recognition list, a word with a phonetically similar 
foil is indistinguishable from phonetically dissimilar old words until the 
appearance of its rhyming foil; only those bid words that follow their 
rhyming foil on the recognition list can Lc; said to differ from the 
nonrhyming old 'words. In comparing recognition judgments of rhyming and 
nonrhyming old words, it is reasonable to .consider as "phonetically similar 
old words" only the words that appear after their rhyming foils; and 
consequently, all other repeated words must be viewed- as ' nonrhyming old 
words. Using this criterion for categorizing old words, the frequency of 
"false negative" recognition errors for the good readers was 23.8 percent on 
the rhyming old words, and 28.8 percent on .the* nonxhyming- olcT words. The 
comparable error .^rates for the poor readers were 18.8 percent and 19.6- 
percent respectively. 

*The t)attern of, false negative errors reflects a tendency on the part of 
the good readers to 5:r^y <:[\:it a word from the initial list was "eld" when it 
followed its rhyming foil. Thus, tor thr. .good readers, words on the initial 
list'that followed their rhyming foils on the recognition list more frequent- 
ly evoked "yes" judgments than did words that lacked rhyming'' counterparts . 
The poor readers showed no such tendency. They made a nearly equal number of 
"ye s" ' responses to phonetically similar and dissimilar words. Thus, the 
recognit ion judgment s of repeated words re°intorce . the indications from:thj> 
analysis of the false positive errors that good readers have a more 
persistent phonetic representation in short-term storage than do poor 
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readers. . • 

/ ' ' ' 
Readi^ng Errors ' . , 

Table 2 shoys the mean percentage of misread words .by the good and poor 
readers on each of the four sets (phonetically-similar old words, 
■ phonet ical ly-<Jissimilar old words, phon^tically-similaf foils, and 
phonetical -dissimilar foils) of words.- noted in the description of 

scoring- proc^ .ures, recognition judgments of words that were misread, on; 
either lj/4t were not included in this tajly. In addition, when a misrefad 
word rhymed with one of thi foils on the recognition list, the recognition 
judgment on that foil was al so excluded . As would be expected, the goQ^ 
readers mac'e consi:'t rc»hlv f^wer errors than, the poor readers. In fact, 13 of 
r.he 15 good readtrs made no read ing errors at. al 1... _Th^e poor readers , on t\}e 
other hand, -.nisread an appreciable number of Words. This is a matter for, 
concern only if their errors are unequally distributed among the four sets of 
words. In that event, one could question the reliability of the differences 
in false positive recognition errors ^ "the finding of major ^interest. 
However, from inspection of Table 2, it may be seen that, roughly the same 
proportion of misreadings occurred on ea'ch of the four sets* This impression 
was substantiated by the resuLts of a two-factor wi th in-sub jec t s analysis of 
variance in which ph^-^netic similarity-dissimilarity was tire.^^ted as one factor 
(P) arid old arid new (foil) words were / treated as' the other factor ('R) • 
Neither factor was significant Irjp(l,13| < 1; Fj^d^.lS) < ,l] . It is apparent 
that the errors were indeed equally distributed among, the four sets of words. 
Thus, the differences between the reading groups in the distribution of 
recognition errors on rhyming and nonrhyming foils, cannot be attributed to a 
tendency on the part of the poor readers to. make more errors - in reading the 
words of some sets than of others. 



/ 



TABLE 2: Reading errors as a 
readers . 



function of opportunity for good and pooi. 



Re ad ing 
Group 

Good 
n = 15 



Poor 
n = 14 



Errors 

Opportunit ies 
Percent 

Errors 

•Opportunit ies 
Percent 



PSf 

6 

210 
2. 

27 . 
196, 
13.8 



/ 



PDf 
1 

210 
-0.5 

30 
,196, 
15.3 



4 

210 

r.9 

13. . . 
19.'-. 
15.3 



■PDo 
2 

210 
■ 1.0 

34 
196 
17.3 



PSf - 
PDf - 

PD„ - 



Phonetically Similar Foil 

Phonetically Dissimilar Foil 

Ph one :; ic a 1 1 y S-im i 1 ar- 01 d- Word 

Phonet ica^lly Dissimilar aijj. Wor^,; 



, DISCUSSION 

In a recent study (Liberman et al,, 1977), good beginnia^; readers were 
found to be more affected^ than poor readers by the phonetic characteristics 
of vi.sually-presented items in a recall task. We attributed this rosult to 
di fferences ^between the groups ' abil it ies to employ phonetic representation, 
TTie poss-ibility has been raised , however , that differences in rehearsal 
strategy may' account' .for the finding. . The major aim of the present 
experiment was to clarify the interpretation of the earlier study by u. ing a 
task in which rehearj^'al was not a factor. For this purpose, a recognition 

memory parad igm was used instead of a recall task, The advantage of this 

procedure is that it does not alert- the child to rehearse the target items , 
because he is not informed in advance that his memory of these item^s will be 
tested ' 



A secondary aim" of the present experiment was to demonstrate the 
differential effects of phonetic similarity on goo(jl and poor readers in a 
task that - -^ploys words r.ather. than arbitrary 1 et t er sequences , thus extend- 
ing the e. , le.r findings to a situation that more closely approximates an 
BTtyal' read ing tagk, [ 

The resultr are summarized in Figure 1: the good readers made fewer 
recognition errors on the norrh3miing foils relative to their performance on 
the rhyming foils; in contrast), the poor readers made rou'ghly equal numbers 
of errors in . recognit ion judgiiients on the two types of foils. The confirma- 
tion of the interact ion -bet-weep reading ■ ability and phonetic similarity with 
this hew task that ' minimizes possible rehear sal effects, suggests that . the 
earlier finding's, cannot be a^ttributed solely to differences in rehearsal 
strategy between good and poor readers. The data, therefore, tend. to support 
the hypothesis that the two reading groups differ in their. use of a phonetic 

representation, f ^ 

i ■ ; 

It might be concluded, t^en, th^t poor readers have a specific difficul- 
ty in - ^accessing a phonetic [representation derived from script. There is 
reason- to* bel ieve,. however , ^that the poor readers* difficulties in making 
effective use of a phonet ic , iepresentat ion are of a more general nature and 
not limit'ecl to recodirig from. Script, The evidence comes from a study 
reported bv Shankweiier atid""Liberman (1976) that was a sequel no the Liberman 
et al. ( 1977 ) jvis.ua 1 recall' jexperiment. The point of that ^tudy was to 
create an auditory analog ot the earlier experiment , In \wh ich the letter 
strings, would^ be presented or| magnetic tape instead of tac.*^»istoscopically , 
Since phonetic coding is pr|:sumably unavoidable when speech i^. presented 
auditorily, both reading groijps in the auditory experiment \ would thus be 
forced to code, the incoming spieech signal phonetically. If the poor readers* 
essential difficulty was specific to recoding visually presente'<^ script, the 
auditory version "5'f the recall' experiment should yieldi different results; the 
statistical interaction between reading ability and phonetic similarity, 
obtained ^in the previous stud/", should disappear. However , if the inter,ac- 
t-ion— remaned > i-t would- suggest that the phonetic recoding differences 
between good and poor reader s are not specifically tied to the ^conversion 
from print t o ' spe ech , but rather that'^he poor readers* deficit extends to 
heard speech as well as written language, 
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The resur'ts *of these new experiments! were nearly identical to those 
using visual recall. As before, the good readers showed significantly more 
phonet Vf^interference than ^the'^poor readers . Thus, it may b.e concluded that 
the^'rtature of ttxe ppor readers* "clefijCit is related to the accessing and use 
^cf'a phonecic representation, regardless of the source of the linguistic 
information. Further investigation of the circumstances that limit access to 
the phonetic representation is likely to contribute to an understanding of 
tho sources of difficulty in learning to read. 
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Interactive Experiments with a Digital Pattern inayback"^ 
Patrick W. Nye, Franklin S. CcKOper , and Paul Mernielstein 



ABSTRACT 

Among the most useful tools for speech research h.-^ve been 
those that enable spectrograms to be ^compared with one Japther , 
that provide ways of modifying speech data and that permit the user 
to listen to the modified speech signal. This paper reports an 
experiment in which such an interactive research tool—a Digital 
Pattern Playback (DPP)--was- used to . evaluate a spectrum-matching ^ 
and dictionary-search technique for speech recognition. Tne DPP 
was used to display spectrograms of "unknown" sentences. An. 
analyst divided these sentences into segments of word-length and- 
listed their important acoustic features. Using these features, an 
interrogation program examined a feature-based spectrograph ic 
dictionary and recovered all the words having features that matched 
each unknown segment. When necessary, additional features were 
assigned to narrow the search. The reference spectrograms 
retrieved from the dictionary were compared, one at a time, with 
the spectrograms of the unknown sentence, and the best match was 
selected for each unknown* segment^: ■■ In general, the performance of 
the human analysts was found to be quite low, since only 26 percent 
of the words contained in the sen|:ences were matched correctly. 
The paper concludes -with a discussion of the factors governing 
human and. machine performance on spectrogram matching. 

INTRODUCTION 

This paper describes results obtained from a speech analysis experiment 
that explored methods for organizing the information required for automatic 
speech recognition. The experiment required that the analysis operations be 
performed by two human subjects, who worked from visual displays. These 
analysts studied the spectrogram, waveform, and amplitude functions of an 
unknown sentence and divided the sentence into word-length segments. , Having 
listed the most salient features of each segment, the analysts then sought a 
set of matching reference words that were retrieved automatically from a 
feature-labeled dictionary. The identities of the reference words were not 
known to either of the ana lys t s whose data are reported in this paper. Thus-, 
syntactic and semantic considerations did not play a direct part in the 
selection of suitable matches. 0- . 



*This paper was presented in part at the 90th meeting of the Acoustical 
Society of America, San Francisco, Calif. November 3-7, 1975. 

IHASKINS LABORATORIES: Status Report, on Speech Research SR-49 (1977)] 
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[ngenu'uin niui Mt» rmo 1 s t: (m n (1975) have r^'porlini l lu> results ' oi ,somo 
similar expcrinuMUs Lluit were cnrri(>d out wit. h convent ionnl pnprr spcclro- 
c; r nin s . Th o i r e x p e r i tMi c <: sh owe d t; h 3 1 I h e c [■ c r i c vi 1 p v o b 1 o ms b i * c ,iin e serious 
whtMi subjects Vjftne nvjuired to wt^rk with reference 'libr^^ries ns Kir^t* as 100 
words. Tile prt» sent work reprt^siMU: eJ a continuation of. those experiments but 
avoided the inconvenience ol handling volumes' oi paper by usinp. a computer- 
b a s e d display s y s c t^nv . . 

THE DISPLAY SYSTl-M 

Th 0 s pe o c\\ signal s we r ? display ed by an i. n t e r a c t i v e r e s e a r c li t oo I - ~ 
called tiLO Digital Pattern Playback .( DPP ) -~wh ich has been bui.lt around a POP 
11/45 and GT40 computer system- (Nye, Reiss, Cooper, McCuire, Mermelstoin, and 
. Mo n 1 1 i c k , 19 75 ). Th e s y s t m organization is sketched in Figure I , Th e POP 
11/45 runs a > g e n e r a 1 - p u r p o s operating s y s t em allowing mu 1 1 i p r o g r am access 
from several terminals. 'ITie nT40 supports the display functions. ITie 
analyst, seated at the keyboard, can selectively access the, PDP 11/45 or the 
GT4 0. Using this facality, he jnay display two spectrograms lying one above 
che other on the same s tvreen--e ach -- r e pr e sen t i ng 1*6 sees o f speecli (see 
Figure 2), Th(^ lower spectrogram dis-play field is usua 1 1 y occupi ed by a 
reference item that has been selected from the d i c t ionar y .-.and installed there 
*for direct comparison with the unknown... A cursor, controlled by a knob, car} 
be moved to any\point along the time axis of the upper, unknown spectrogram 
and the cross - section at that point can be - displayed. A similar cross - 
section facility is also available for the lower , spec t r ogram . In addition, 
the e.ser has the freedom to examine waveform- plots for the unknown at points 
indicated by the cursor, and to examine the intensity and fundamental 
frequency functions of selected segments of speech data. Other facilities 
include provisions for manipulating speech spectra and hearing the results 
through a channel vocoder. The system forms a general speech analysis- 
synthesis f ac i 1 i t y > onl y a few of whose capabilities were employed in the 
experiments described here. 

ORGANIZATION OF TTiE RETRIEVAL PROGRAM 

Each of the reference spectrograms consisted of 'a candidatie word 
presiMited in the sentence frame "Please say again," These spectrograms 

made up a lexicon of 100 re ference. items of which 20 had both sTt r es sed . and 
unstressed forms represented, giving a grand. total of 120 entries. The items 
* we r"e~~ s t o r'e d on a" disk in such a way that they could be selectively retrieved 
by means of a specially designed program that also collected data on each 
analyst's decisions and analysis procedures, A. genera.l. mode I of this process 
isgiveninFigure3. 

Before commencing the experiment, the two analysts were each asked to 
select a personal set of up to" 16 descriptive features that were considered 
to be useful in correctly selecting matching words from the lexicon. Each 
analyst then used, his chosen features to label each member of the reference 
list. Any one ofthree discrete values could be assigned to each feature; 
either present, absent or unspecified. 

Tn e re t r i v.'i 1 pro/^ ram 1 i s t ed the f ea t ufO s that an analyst found in a 
word-segment of th(^ unknown sentence and used t.h i s list (or feature,:' vector) 

88 . • . 

. • . 



Tl 

cn. 

70 

m 



adoiiionai 




5PECIRUM ANALYZER. 






CONIROUER 




/ 














1 

1^ 
















\ 










0 


0 ' 




CURSOR, 
CONTROL^, 




AUDIOTAPE 
M/RECOBDER/PLAYER 




0 0 o.o- oO'O 



KEYBOARD ^ 



DISK DRIVES 
and 
other 

'MASS STORAGE 
DEVICES 



11/45 CPy and MEMORY 



TERMINAL 
INTERFACES 



BUS 
WINDOW 






9 



00 



Figure 1:' An overall view of the DPP showing its principal-components. The 
physical components actually used by the analysts were the HP 
display and cursor, the GT40 display and the keyboard. 
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Figure 3; Mo(Jel of the information pathways available to the analysts 'who 
formulated an hypothesized spectrogram from a se(|uence of 
reference-word units. 



An advl i I I oiKi 1 j>ii^p»Mt ot t lu' prtn'.r.'iiii w;i;i * IkiI "nn ?; p«'i' i I i •'vI " l^^•lIul•»; v:ihh's, 
in I «M-iMiv'.' wt>r.ls, iiAitv:lMu! ImM li "pri'S'MU" Jiul ".-ibsml" .iss i ^•inmM-t i nl t 
liMtiU'o.^J in riu' unlMHMon s.-yMmMil » 

» Wliil.> iiKitrhin): pi o)M"^iin was nnil<M- way, .in .nia I v :i I *^ r (.n 1 il r.pi'iitv 

.'Kid il iona 1 N.iMui' i n I v>Mn.il i on abonL 'tlw' nnknown by chan|;,1flU', valnrs oi 
s pre living; previous Iv un S'pt* r i t i valur;;. A fit r n.u i 1 y , Ik- rt'las 
ttMtuiv ass i v.niiit'iu s In' i nr ^^as i nj'. t hr lumibtM- oi iins pt' c i t i rd IraMiros and 
tMuM'iMvy incr«'ast' i b»* bu- nl i b»^ mati'biny: Wiwd lisl. "i'br lunubtM" ot r r I r I'l'nci' 
s pi'Cl r(\i'. rams I Iku in.it iln'd "any " sp*;'^' ' I ^ ^'d Usil. nn'-vr n i' coil.K^ b»' raf)idly 
thMtManin^d. In t hr <'V»'nl tbal I inaitv i / 1 im*.mk*»^ ittaiis mat t*brd ihc spiciiiod 
Iralurt'S, Lbr .malvsi- was aHowmi t c> r^'valn*' f Mlnrrs in I lie rrl»rt>icr lisl 
to achit'vo r,'i'«'a(. '.T pr 'U- i s i on . Wlnni Ibo nuinbrr al n^lriovrd it'curi "Vol! to a 
nt I i c i .mU 1 V low lovoi, Ibo analvsl ooiald S( an llirongb ftioui one by one, oach 
t. imr d i s[^l ay,iiu^. t. bo j\o,t fnt i a 1 inaLfb abovv t bo unknown. In ordor fo tnakt^ a 
unifpio soloclion, boYotild t.bon invhko additional in t oriiial: ion not included in 
f hr proviouti' iVat.uro a s s- i ^' ninonl ; lor oxainplo, oxpocLod t\irmanL sliifts Ir^m 
tin- rrlorono.' loi^'fii f o lil'liio apparent conloxL ot (ho unknown; If nono of 
tlio" ml r i i^vcil ilonis nialobod sntficionlly wo 1 1 , .tbo foaluro as s j ^^aii»»'"t" 'was 
t hon inoditi<'d lo .solool a now 1 i ,^7l oi niai:chinK wojhIs . 

'Ibo analyst ounld also display a sorios of potonlial word nn'^^chi'S in nn 
appropriato ordor, sido by sido, and jud^ o wbytber coart i cu 1 al i on offoct-.s 
oould account tor ibo roniaining discrepancies between tbe reference words and 
the unknown. Alter ihe, analy'sl had arrived at a hypothesized reference-word 
' seqt'r-,.^ that satisfied bis c r i t er i a , ^the si'quence of items was piven t,o the 
orijMnnl speaker to be spoken^ in the same tone of voice nud with the same 
intonation pattern used, in the original unknown sentence. This production 
form, of the matched sequence- was then added, to the data bas;e.for tbe 
analysts' examination. At this point, new reference words could'be substi- 
tuted where the analyst noted that a mismatch with tlie unknown se^ntence had 

> 

e»ccur red . '. ■ 

I." I, ft 

The reci;>rd-keepi'?ig section of the retrieval pro;[;ram noted. the nunber of 
searches^ of. the "reference library that were made by both the analysts and all 
f the rt^ference words that the»y examined. This record allowed the authors 
to ircice the s i pun i.. f i c ant information feedback paths in the system--those that 
resulted in new searches of the reference library with differing feature- 
vectors. These feedback paths are noted in Figure 3.^ The extent to which 
lexical i n f rma t ion can modify an analys:*t's segmentation and feature assign- 
me'nt was not surprising. In fact, through this attempt to model explicitly^ 
the information flow amon^: the vrrious subtasks of the analysis process we 
have uncovered a structure* similar *to the model for speech recognition 
proposed by Fant (i970) nearly 7 years ago. 

PJXFERIMENTAL OBSF-RVATIONS 

Both analysts found the 16 assignable features to be i n su f f i c i e-nt aad 
would, have used a larger number, had there been prevision to do-^o. Howe^'.-r, 
even the assignment of''*sixTeen featur.?s to 120 refer-cnce items was ve^y tine- 
cons !^l i n ,1: . ■ in or do r n o r to. i m po s i ^ a n y p i- i o r ., To .-i t u r < \. o r j a h .i z a t i on on o u r 
analysts, nil ie.irnres w.r>- considered (equally impc riant in e s t ab 1 i r/u l ng a 



inati'h. Til.' .in.ilvslM WiT** I rust imI bv t hi' luu^ssilv lo pxi»l: 'ily in.irk 

.-ibfUMU'i' ot many I im! uii' i; ■ -^j r»Mju i n'liu'U ( iin(u):;<H! hy Hm- si n^? I i**- 1 «*\ r I Ic.itmr 

or>»<in iz/i t iiMv. ot i\ mtiltihwi^' or lii kM'ari'li i c li^aturr orK«nu /;it i on UfCi's- 

rt ilat. in^»..l lu^ soloclii)!! ol r,Oi'Oiui;irv lonturrs only it tln^y wim'i' ajMiro j)r i :it in 

tlu'. 1 irill spcHMtic :i;is i^MuiH^nt s I I prininry lonturos, wouUl Ik'\vi* 

o vo rc oim' 1 1 . i s < 1 i t i i c u 1 1 y . 
« 

Born .'in.i lysCs found litt lo di I i it'ul ty ostabl ishin^i rr liu'iMici-word 
m.'Uchos to ♦rho p ronu n t^it woi;dM ot tlu» unknown soquont'o. In lacl, tlH»y wtn-c 
surpri^^l•d K^o, d i.^ cover 'how little information (possibly only 3 or A fiwituros) 
aufficecf for thi^ retrieval of more thni. 6 matching!; items. Nuro sev(^re 

d i f f iiMil t i es were encouiiteriMl in nLtempting to find the matches for the loss 
prominent words syllables. Here the anabsts did not trust their feature 
ass iftnment s-'-an indication of the difficulty that chey ertcounCered in making 
those nssignmeats in the first, place. One analyst resorted to an exhaustive 
scanning of the list of unstressed reference items. I'he other compared pairs 
of stressed and unstressed reference items to inter which f(^•1tures could be 
expected to be ^:n'oer to d*^tect under reduced stress. lie then relaxed the 
feature assiii»nment for the cclrres'^ond ing unstressed items. 

) s»-c;ond analyst attempted to overcome the word segmentation problem 

by^selecting prominent syllaDLeii irounc^ which to organize a retrieval 
attempt, ^lli e ability to look at variaiions^ in the spectrum (Envelope as the 
cursor swept through successive time intervals of the spectrograms .proved to 
be quite helpful in selectii,' ^h*-"^ most prominent syllable of a sequcMice . 
Organizing the retrieval stra:egy abound prominent syllables permitted the 
rapid examination of altejnative hypotheses. For example, the first hypo- 
chesis might be a monosyllabic stressed word, the second a bisyllabic word 
with an additional unstressed syllable. Information about additional conso- 
nantal segments could be added to the feature vecc-or used for retrieval until 
..^le number of retrieved items was small enough to be individually scanned. 
Even though only a few ^-'lient features located near the prominent vowel were 
assigned, the retrieval process frequently resulted in an obvious match to a 
much longer segment of the unknown. 

The features describing vowell color were :iot found very useful by either 
analyst. There are two reasons that may account for this Hnding. , First, 
contextual influences on the vowel formant - f reqt-ienc ies of both the reference 
word and.,the unknown wprd-sagment made reliable featur-.e assignment difficult. 
Second, very few of the reference items differed by vowel color alone. Thus, 
the specification ot vowel color features' did not s igT^ i f icant 1 y reduce the 
number of retrieved natches in contrast to leaving them' unspec i f ied . 

The one analyse who attempted to make use of segment dura t ion . in** h i s 
feature assignment -^ound it to be useful -only in extreme cases. For" the most 
part, the segmental durations of unknown words varied considerably as a 
function of stress, syntactic role and position in the sentence, making small 
durr-tional ditferences ineffective for discrimination purposes.. 
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Results 



The average prQ»por t ion ' o f words that the two analysts succeedod in 
correct>ly-,matching was only 26 percent, and" this figure did not increase 
after one cycle of feedback.. Although one error was corrected, an additional 
error vas "•'"Introduced in the words* hypothesized on .the second attemp^t. The 
overall word-matching performance was thus significantly lower for the 
machine-assisted word-matching expe,riment than for the "similar experiment 
conducted with' convent ional spectrograms by Ingemann and Mermelstein ( 1975).^ 
Tliere are_ several possible, reasons/ for this deterioration in matching 
performance. The relative unf ami liarity of the display — in particular the 
way^ acoustic features seen on the DPP are affected by the limited time 
resolution of the display — may h-^ave.. been one' factor. More importantly, 
perhaps, the sentence in the current experiment was longer (21 words vs. 16 
words) and somewhat more complex. The lexicon used in the DPP experiment 
intentionally included more words that had close phonet ic similar it ies to the 
unknown words of the sentence. \^ 

' ' . ' . " ■ ' ' . \ 

The word-identification scores are broken down by analyst, "^tress , and 
number of syllables in Table 1, While 52'prercent of the words that* contained 
at least one stressed syllable were correctly identified overall, pra^ctically 
all of the matches .with UTistressed words were incorrect. Overall performance 
on multisyllabic words was somewhat higher than on monosyllabic words. Here 
thfe' relative performance of the subjects differed .significantly. The analyst, 
who used the strategy that focused on prominent' syllables did better on 
omonosyl labic words but worse on multisyllabic words. The strategy led to 
frequent errors on the unstressed syllable of a multisyllabic wordT- 
particularly when phonetically similar words were included in thie lexicon. 
Substitutions in the unstr^ssed syllables, of those words were quite frequent; 
Examples of such substitutions are "immunity" for "community", "^luman" for 
"humor", "arrive*' for "derived"; and "salt" for "assaul t . " . 



TABLE 1: Percent "'cor rect ly identified words 

' ► Pass 1 



Monosyllabic words 
Multisyllabic words 
Unstressed words 
Stressed words 
A). 1 words 



^ Al 1 syllables 



15.- 
6 
11 
10 
21 



30 



Pass 2 



Tokens -Analyst 1 Analyst 2 Analyst 1 Analyst 2 



33 
17 

9 
50 
29 



20 
33 
0 
50 
23 



27 
17 
0 
50 
23 



20 
50 
0 
60 
29 



Percent correctly identified syllables - 
43 '47 43 47 
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CONCLUSIONS 



The single most important observation to emerge from the results is the 
poor performance of the analysts on unstressed words. A reference token, 
whether spoken in a stressed form or in a different unstressed environment, 
does not provide sufficient information to enable the analyst to effect a 
match. Perhaps a larger number of reference tokens taken from a variety of 
contexts in which the word may occur might be useful, since it is evident- 
that, analysts are usually unable to predict the transformations that the 
acoustic features of words can undergo if they are uttered in phonetically 
different contexts. Analysts generally judge similarity in terms of common 
features between the unknown and reference tokens. They do not pay 
particular attention to the variability of those features and thus do not 
differen^iate among the features according^ to their reliability in 
establishing matches. It seems likely that intensive learning sessions on 
the variability of acoustic features are required before improved word 
matching results can be obtained. 

The lack of any significant improvement following feedback of the 
hypothesized words spoken as a sentence is probably due to the fac^. that the 
overall per formance was in i t ia 1 ly - too low (that i$ , the initial hypothesis 
was offered with such a low level of confidence that it contributed as much 
to the .analyst's uncertainty as it did to his knowledge). It appears to be 
that a. higher- minimum performance must be-, reached before the information 
supplied by feedback can be usefully absorbed. If an unknown word is- 
embedded in the correct context, its appearance, is likely to be quite similar 
to its' form in the unknown sentei^ce. However, if the- context is incorrect as 
well, a new production of the reference form is obtained that may not be any 
more similar to the unknown than it was to the original. 

Let us now consider the ' prospects ■ for implementing an entire feature 
assignment arid word-matching procedure in algorithmic rorm for execution by a 
machine. The selection of matching words "on the basis of assigned feature 
values is clearly the easiest procedure to implement, and, "in^fact", this has 
already been- successfully carried out. Heuristics are available for the 
assignment of values to most acoustic features and, therefore, we can expect 
that this analysis procedure can be implemented at a cost that increases 
roughly linearly with the ' number of features used. We anticipate more 
difficulty, however, with the process labeled ''similarity". We are not, as 
yet, able to quantify a general similarity metric that assigns pierceptual ly ' 
appropriate weights /to specific differences.. Events of short duration, such 
as'bursts, mSy pont'ribute a great . deal to measures of similarity, whereas 
differences^" in events of longer duration, such as shifts in formant 
f requenc ies/ in vocalic intervals, may be of less significance. 



It is possible that the compar ison ' o f word-sequences might . be 
implemented with the aid of a speech synthesis program; hov;eyer , it appears 
that finding an appropriate Tuetric of similarity is the most difficult 
problem. Given any general diffe:rence raea^sure, we jdo. not^ yet know how to 
separate differences between speakers from dj^fferences betv7e^n words, and 
until we can }3arn what the important d is t inc t ion^ are that we must look for, 
word identification through, spectrum matching by a human analyst, or by a 
-.machine, will. not be-a practical art. 
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The Function of Strap Muscles in Speech: Pitch Lowering or J.aw Opening?" 
James E. Atkinson^ and Donna Erickson 



ABSTRACT 

This paper reports on one aspect of a continuing study to 
determine the physiological correlates of the changes in fundamen- 
tal voice frequency ( Fq ) , Several e 1 -c t romyo^r aph ic ( EMG) studies 
with speech have reported an association of strap-muscle activity, 
■particularly the sternohyoid, with low Fq and some of these studies 
suggest that the sternohyoid is "actively involved in lowering Fq . 
It has also been suggested, however, that the sternohyoid is 
involved with jaw opening, and that the reported pitch-lowering 
effects may actually be the result of jaw opening. To investigate 
this que^stion an EMG experiment was conducted on one speaker of 
American English under normal and clenched jaw conditions. The 
normal ut t erance s were of the form "Bev loves Bob" with .eTnphasi| on 
the various words/ The c lenched . jaw data were obtained while the 
subjrsct l.old his jaw fixed by biting on' a> tongue depressor and 
intoned the corresponding intonation patterns with a fixed vowel 
carrier /a/,' The results indicate that the strap muscle act i<fity 
for the. normal utterances is very similar to the activity for the 
■ same intonation patter: -^th the jaw clenched. Strap muscle 
activity thus seems- to ^ore closely related to pitch effects 

than to jaw-opening effect o. ' . 

This paper reports on one aspect' of a continuing study to determine the' 
physiological correlates of changes in fundamental voice frequency [^q]- 
'Specifically, we investigate the sternohyoid muscle, one pf^ several extrinsic 
laryngeal muscles, and its role in cont rol 1 ing Fq . Several electromyographic 
4rEHG) studies with speech have reported an association of strap muscle 
activity , particularly the st er'nohyoid with low Fq, and some of these studies 
suggest that the sternohyoid is actively involved in lowering Fq (Faaborg- 
Andersen, 1965; Ohala, 1S70; Ohala and Hirose, 1970; Atkinson, 1973; Collier, 
1975; Erickson, 1975). It has also been suggested, however, and there has 
b-een some supportive data, that the sternohyoid plays a role in some 



*A version of this pap^r was presented at the 92nd meeting of the Acoustical 
Society of America, San Diego, California, November, 1976. 

"^Special Projects Department, Naval Underwater Systems Center, New London, 
Connecticut. • 
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articulatory gestures involving jaw openii-ig and that the reported pitch 
lowering effects may actually be r.he -result _of jaw. opening (Ohala, 1972 ;. 
Ohala and Hirose, 1970; Harris, 1971). ^ 

Figure i gives a .s impl i f ied'' schemat ic representation of the relevant 
anatomy First , the thyroid cartilage and larynx as a whole are suspended 
from -the hyoid bone 'by the strap muscles (hence, their name). Clearly, 
con.tractiion of these muscles .can affect the thyroid cartilage in either the 
front-bacR-or in the vertical direction. Any such movement could change the 
length -and tension of the vocal cords and hence their rate of vibration (Fq)-. 
The exact mechanism involved is still not clear, although several possible 
explanations have been suggested. 

The .figure shows only one of the supra-hyoid muscles, for simplicity, 
the digastric muscle, although there are other muscles (such as geniohyoid 
and mylohyoid) in this group. Both the strap muscles supporting the larynx 
and the jaw opening n^uscles attach^^to. the hyoid bone. As seen in Figure 1, 
^ntraction of the digastric creates a force that pulls the hyoid bone 
upward. To allow jaw ^opening,, there must be an apposing downward -force to 
stabilize .the hyoid ^and give the. jaw^ opening force something to pull against. 
Thus. '' 'it ; has been suggested that the sternohyoid and/or other strap muscles 
contract to supply this force and allow jaw opening. ^ . 

To investigate this question, an EMG experiment was conducted on one 
speaker of American Eng^lish under normal and clenched j.aw .cond it ions . The 
normal utt^erances were: "Bev 'loves Bob," with emphasis on various wo.-ds. 
The clejiched jaw data were obtained while the sybject held his jaw if.ixed by 
biting on a tongue depressor., and intoned the corresponding pitch patterns 
:with fixed vowel carrier Va/. - An example ' is " BEV loves Bob" with^ the 
corresponding clenched jaw form "AH. hah hah.^" A direct 'comparison of 
sternohyoid activity for the same pitch pattern with and without jaw opening 
effects was obtained. 

Table 1 list^ the utterances used. 



TABLE 1: Test utterances, 



NORMAL CLENCHED JAW 

BEV loves Bob. AH hah hah.' 

Bev LOVES Bob.. ah HAH hah^ 

Bev loves BOB. ah- hah HAH. 



EMG data were obt ained from the . sternohyoid^ musC le using lYooked wire 
electrodes and then recorded for processing and analysis using the ^Haskins 
-La-bar i3 t-o r i e FMIL_Lac i 1 i t v . 
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p 



JAW 

OPENING 



DIGASTRIC M. (ANT.) 



STERNOHYOID M. 



PITCH 



LOWERING 




STERNUM 



MASTOID 
PROCESS 

DIGAiSTRfC M. 
(POST.) 

HYOIDBONE \ 

THYROHYOID M. 

LARYNX 

STERNOTHYROID M. 



._Eag-ure-_L:; Simplified schem atic representation of -the nias.cies involved in 
pitch lowering and jaw ppeifriTig-; — — , 
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BEV loves Bob. 



— STERNOHYOID ACTIVITY 

„ PITCH (Fq) 



r 

Figure 2: Comparison of st ernonyo id 'muse 1 e activity for normal and . c 1 enched 
jaw versions- of an utterance having the same intonation* pattern. 
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^ BEV loves Bob. 



NORIVIAL 




SEGMENTAL 
} SIMILARITY 

p =0.3 



■ STERNOHYOID ACTIVITY 

PITCH (F^) 



F.igure 3: Comparison o f ■ ■ st ernohyo id muscle activity for two utterances 
having the same segmental phonemes but different ir\tonation_ pat- 
terns. ... 
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The major results are given in Figure 2, which i hows sternohyoid 
activity for the normal utterance " BEV loves Bob," and for tihe " clenched jaw 
version with r the same Fq ^pattern "AH hah hah." In" comparing " these t.wo 
utterances we are, in'-effect, holding" pitch constant, and any differences in 
muscle act ivity must be a result of articulatory and jaw opiening effects. 
Although there are some timing d if f erences , " ad seen in this figure, it is 
quite clear that the sternohyoid activity is very similar for both the normal 
and clenched jaw versions. In fact, even with the timing differences the 
waveforms have a correlation coefficient of 0.7. Thus, no noticeable Jaw 
opening effect is shown. 

In Figure 3 we compare sternohyoid activity for the normal utterances 
"BEV loves Bob" and "Bev LOVES Bob." Here we effectively have . the same 
segmental and jaw opening effects but very different Fq patterns. Any 
difterences in muscle activity thus would seem to be caused by pitch 
d i f f erence s . 

5 ■ ' " " " 

Clearly, the muscle activity io less similar here than in Figure 2 (the 

correlation coefficient is only 0.3). Thus, a clear pitch effect is seen. 

To summarize, utterances havir.g the same pitch pattern regardless of 
art iculatory, differences . show very. similar sternohyoid activit.y. Utterances 
having the same articulatory and jaw opening gestures (but different pitch 
patterns) show very d i f f er ent st ernohyoid activity. conclude, therefore, 

that sternohyoi -activity is more closely related -to pitch effects than to 
jaw opening effects, at least in tlii^ speaker. We are presently extending 
the study to other, speakers in order to test the generality of .these 
cone lusions . 
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The Geniohyoid and the Role of the Strap Muscles 
Donna Erickson, Mark Libermant and Seiji Niimi 



ABSTRACT 

Many investigators have noted a relationship between stra]r 
muscle activity and pitch lowering, but there does not seem to be 
any single generally accepted theory to account for this connec- 
tion. The particular effect of strap. muscle contraction will 
depend in part on what other forces are actirig'-on the hyoid bone; 
therefore, in thp context of a general EMG investigation of English 
intonation, we recorder from a suprahyoidal muscle, the geniohyoid, 
as well as the strau muscles (sternohyoid, sternothyroid, and 
thyrohyoid) and the cricothyroid. In our data, the three strap.. 
. muscles show nearly identical patterns of activity; as a grocp , 
their activity shows a strong negative correlation with the activi-v 
ty of the geniohyoid and the cricothyroid. - Examination of the 
relationship of these muscles'- activity to Fq levels showed the 
cricothyroid and geniohyoid to have a positive relation to Fq, and 
the sternohyoid (selected as a representative strap muscle) to have 
.1 slightly negative relation to Fq. These findings are , related to 
"th^ :development of a possible model for the relative motion of the 
larynx during pitch changes. 

EXPERI^IENT 

It is known that the strap muscles [sternohyoid C^H), sternothyroid (ST) 
and thyrohyoid (TH)] are active §uring low and . falling Fq (Chala, 1970; Ohala 
and Hirose, 1970; Atkinson, 1973;= Collier, 1 97 5 ; - Er ickSon , . 1 976 ) . ■ The 
.suprahyoidal muscles have not previously been investigated with respe^e to 
their role in Fq control in speech; yet, the anatomic^,! arrang.-ment of 
extrinsic laryngeal musculature is such that an e f f ec t o f \tH:^:v^,t;irv?p^ 
with respect to Fq will certainly depend in part on suprahyo id al.-, forces as 
one can siee. by referring to Figure 1. In view of 'these considerations, we 
examined\the EMG activity from a representative supr ahyoid a 1 mu sc le , the 
geniohyoid (GH), as well as- from the three strap 'muscles (the SH,;:..S:T, and 
TH), and the cricothyroid (CT) in the context of a larger EM^ experiment on 
English intonation. ^ . ^ ' ] 

We will present d^ta from this experiment that bears on the following 
two questions: -* . ^ 



"^The paper was presented at th'e 92nd meeting of the Acoustical Society o^" 
America, San Diergo, California, 15-29 November 1976. 

"l"Bell Laboratories, Murray Hill, N.J. , r' 

[HASKINS. LABORATORIES : Status Report on Speech Research SR-49 (1977)] 
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Hyoid Bone 



Thyrohyoid 



Sternothyroid 




hyoid 



Sternohyoid 



Sternum 



Figure 1: Extrinsic laryngeal muscles. It is hypothesized that the larynx 
tends to move as a whole- in an. arc , as shown by the arrow. ,. „ 
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1. What overall relationship do the ac t iv i t ie s " cf the strap muscles, 
cricothyroid, and geniohyoid bear to each other? , 

2. What relationship^ do the activities of these mu sc le s . bear^.^'t o Fq 
levels? ' ' . 

Eight sentences, from 9.to 14 syllables long, with Vc>.:.ous ocres^ing? 
and intonational patterns, repeated eight times, were examined. We have only 
recorded a single 'speaker thus far. The quantitative data we will present m 
this paper is drawn from a subset of^ the total, but the re::,ults are 
qualitatively valid for the entire experimental, run. ^ 

Re sul t s -"^ , • ' . 



Relationships among the muscles . Pearson product-moment correlation 
*coetf.ici^nts ^^^ere" calculated for the various mu-cles using the Raskins 
Laboratories computer-implemented correlation program,^ ^nd the results :an be 
seen in Kigure 2. The activicy of the three strap :ausc le s . pos 1 1 ively 
correlates with each bth-r, and negatively correlates with the geniohyoid and 
cricothyroid. " Th.- accivity of the geniohyoid and cricothyroid positively • 
correlates with e.ich other. Althoi^h {quantification of the ^correlation of 
in^rastrap. muscle ao'tivity,in term? of. corre lat ion coefficients has not been 
presented heretofore in the .literature, the data here agree with the findings 
of Erickson ( 1976). The finding of a negative relation betv/een thp activity, 
of the strap muscles and the cricothyroid has been reported previously m 
other EMG studies. (Atkinson, 1973; Collier, 1975; Erickson, 1976). The 
positive relationship between the activity of the ;CT and the GH has not been. 
eV: lored" previously . We are currently inve s t igat ing; th is with respect to 
possible physiological corre lates. .of stress in Engl ish . 

Relationship of muscle activity to Fq levels . Iri^ order to ascertarn the 
relationship of these muscles to Fq level's, we concentrated on two key 
syllables (the intonational "lTead'» and "nucleus") in two repetitions of the 
• sentence " It 's nothing Le^ss ■ than a masterpiece ," spoken on f ive ■ mto^iational ■ 
patterns [see Liberman (1975)]. .^e compared root mean square of the 
integrated EMG activity for these syl lableS'.to mean Fq 100 msec later. This 
delay appears to be approximat e ly .'appropr ia te for the contraction time of the 
laryngeal muscles ( Saw.a sh ima , 1 97.4 ; ^At kin son , in press). The RMS values were 
calculated from EMG recordings s^p^ed at every . 5 msec. Th ^ Fq values were 
calculated 'from the voiced part of the " sy 1 1 ab le . 

, ' ' ' . ^■ 

The GH activity has a c'lear^ pos i t ive rela t ionsh ip to Fq. above about 105 
■ Hz. This- interesting . observation leads to speculation about possible . GH 
f unct ion as an -auxil iary pitch-raising mechanism for high Fq , when the CT 
needs an extra "boost" to raise Fq , as in' stressed syllables. V/.e are 
investigating this further. The CT act ivity $hows a posit ive" re lat ion to Fq, 
^ and in this respect agrees with several other FMG studies, (for example^ 
ALl^inson, 1 973 , Co 1 1 ier , 19/75, Erickson, 1976). The results are shown in^ 
''^ Figure 3. The SH activity shows a tendency tov^ard&,a negative correlation 
with F^, an(^\this to.o agrees with the findings of other 'studies (Atkinson, 
1973, Collier, 1975, Erickson-, IdK".). . 
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Figure 2: Correlation, coefficients for the 'activiiy of the. various muscles 
as' spoken by one speaker on two repetitions 'of eight sentences 
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tions of the sentence "It ' s ' nothing less than a /masterpiece," 
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Figure 4: Relation of muse le' act ivity in terms of root mean square values 
for the same -syl Tables examined in Figure 3. 
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In addition to looking at the relationship of ENG. activity t~t)--% J eve 1 , 
it is also interesting to look at the EMG activity for the part^icrular 
syllables as shown in Figure 4. The relationship between GH activity and CT~-- 
activity has not been explored previously, but holds., considerable interest 
for future investigation. The relationship between CT activity and SH 
activity appears -simil ar to that shown for Thai syllables (Erickson, 1976). 

Physiologi'cal Implications. 

These findings can be related to a general picture of the motion of the 
larynx during changes in Fq . ^ There appears _to be a tendency for the larynx 
as a whole to move in'an. arc^ avs shown by the arrow in Figure 1: motion 
forward and up being generally associated x;;ith pitch raising, and motion back 
and down with pitch lowering. This seems-to be substantiated by cineradio- 
graphic evidence that shows the hyoid bone moving up and forward during high 
pitch (Faaborg-Anderson and Sonninen, 1960; Colton and Shearer, 1971). 
(This, of course-, is dependent on the head position and holds true only when 
the head is in the 'upright pps4tion seen in Figure 1). The result of this 
upward and forward motion of the hyoid bone is to pull the thyroid cartilage 
up and forward. Since the cricoid cartilage would tend to remain relatively 
fixed (due to its connectio.n with the constrictor muse le s and . the .trachea ) , 
it would tend to resist the forward component of the motion. The result of 
this relative motion (rotation or translation) between the two cartilages at 
the cricothyroid joint would tend to lengthen t.he vocal folds, as the larynx 
moves up and forward, and relax them as it moves back and down. A paper- 
describing this view in more detail is now in preparation. 

As a final remark, we wish to say that this is a preliminary investiga- 
tion and the speculations and physiological findings introduced in this paper 
are being explored further with a view toward application to current theories- 
of intonation and stress in English. ., , 
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Syllable Synthe^5 i s"^^ 
Ignatius G. Mattinglyt 



ABSTRACT ,/ 

A Kcheme for synthesis by rule based on the phonetic syllable 
is described. A syllable-feature specification of the utterance to 
be synthesized determines a pattern of art icui atory influences; 
the^e influences . in turn determine the parameter values of the 
synthe s izer . ^ ' ^.^ 

For quite a long' time, as the first slide (Figure l) may remind you, my 
colleagues at Haskins Laboratories have been insisting that speech is 'a code, 
and that the encoding unit is the; phonetic syirable. As the result of the 
merging of various coart iculatory ^ influences , the correlates of the phonemes 
at the acoustic level are, in the vivid phrase' of Liberman, Cooper, 
Shankweiler; and Studder t-Kennedy ( 1967), "over lapped or sh ingled , one onto 
another," yielding "irreducible segments of approximately syllabic dimen- 
sions." This observat ion should , indeed, be generalized to include the 
articulatory. level as wel.I (MacNeilage and DeClerk, 1968). In this viev of 
the syllable Iwhich of course goe- back at least to' Stetson .(195:1)], my 
colleagues have been encouraged by the findings of Koshevnikov and Ch'istovich 
(1965). • ' , 

But thv? appeal , of the phonetic syllable as an encoding unit does not 
rest merely on empirical observations as to the unsegmentability of anything 
smaller. There is not time to make the theoretical case for the' syllable at 
length, but I would at least point out how nice it would be if it were 
possible to order freely the units, of an ideal phonetic transcription at each 
prosodic level.-' Because of phonotactic restrictions, this coridition is 
clearly out of the question if these units are conventional phonetic 
segments, but seems quite reasonable if the units are phonetic syllables. 
Though ' over lap between the physical manifestations of adjacent syllables 
occurs, the principle of free ordering in the transcription will be preserved 
as long as such overlap is predictable from the specification of -the 
individual syllables. 

From this point of view, the syllable is a eye 1 ic process , passing from 
onset to peak to offset as the vocal trict moves from a more closed to a more 
open to a more closed configuration. The process can be realized in many 



*This pa{>e\ was presented at the 92nd Meeting of the Acoustical Society of 
America, San Diego, California, 15-19 November 1976. 

tAl SO University of Connecticut. 

IHASKINS LABORATORIES: Status Report on Speech Research SR-49 1977] , 
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Figure 1: Interaction of consonantal and vocalic influences in [teg]. After 
Liberman ( 1972) . 
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different ways, dependinp u^jo > i^^e "t"*!'^^'^ choices of the speaker. In the 
idtvil phonetic transcription J a^tW^ ^^"^Tnce, those choices are the values 
asoigned to syllable - features "^f f'^t s'JgRestGd by Fujimura^ ( 1975 . 

1976), and phonetic segments r'^V^^Ht gj^'^nial ■''tQtus. Thus, not only the 
difference between I pa T and ' 1 j|) > [ '''^t: io dxffcrence between [pa] and 

[pla] depend upon a feature a-^^, ^- The articulatory and acoustic 

consequences of a particular ^\^^lcyg}:^^^^tevmtnQ<i by the Syllabic process 
and may, in pr inc i p 1 e , extend '^^t''^ ^the physical manifestation of the 

syllable. ' , ■ 

I hope it is clear thg^; sSuc?^*^^"^ Phonetic rather than, of 

phonological syllables. If thg/^ f^t ^ things as Phonological syllables — 
the matter is unse 1 1 1 ed — t hey ^^°^^xnQ ^^^^"^^^ correspond one-r.o-one with 
phonetic syllables, any more t\^o^ A<^^X Correspond one-to-one with phpnesV; 
And, by the same token, if pho^O^H \ ^^.UableS ^ not^ exist, the case for 
phonetic syllables is una f f e^^t:^^ ^ epj-ggentat ion in terms of syllable 
features at " the ' phonet ic lev4^ ^''^'^Ri'^^^i^^ entirely consistent with a 
segmental representation at th^ ^^''/"^ra'^^^, 1^^*^^ ' ^^"^ would not necessarily 
entail any fundamental revisio^'^^; '^^ct'^^^^ phonology. One of the motiva- 
tions for generative phonology, it u^tit' '^'^^i^ phonological units do not 
necessarily correspond with ph^(t^^i ■ ® (Choiflsky, 1964). 

What I have been saying Pif': v^^l l^^u ^ explanatory burden on the concept 
of. the phonetic syllable, an^ ^^fe ^ e credible only to the extent that 
syllabic processes can be sho^^ ^'^ ^'^^xi .^^IV ^ explicit. Thus, the 

case for the syllable will be ^"H^ ^^ohK Pho""*^ ^'^'^ ^ ? restrict ions , as well 
as much of what is now regardg-j y t^to^ '^^'^ variation., can be interpr'eted 
as arising narurally from iiih^''^'' ""^^er °^ syllables. Synthesis by 

rule is' an'attractive tool for aking- 



Recently, we have begun t^^t^/o^g. ^^skins °n a new synthesis-by-rule 
program. In this new scheme, (^^ '^P^^^ syllable has a central role. The 
input t.Q.the synthesis' prograi^ >^ ^^ur^ transcription of an utterance in 

syllabic. .rather than segment g-r • ^^^/ip^io^^^es . present, the features are 

bi-nary, which simplifies the t^f"^^ , A ^' ^""^ "° strong commitment 

to binarity' at the phonetic -^"f^l to ^.^^ries of ordered rules relates the 
feature values of the transcr i ( ^^f^i H^^^ Variables Used in- the routine that 
calculates parameter values f^^ H^^^am if^'^ software simulation of OVE III 
(Liljencrants, 1968) . Since th^. ^>'^th pho-iiology. it is not 

at present a practical vehicl^ ^"^^/es^ '^sizing quantities of text. Nor ha? 
consideration yet been given U^5. ''^^'i intonation,, though the syllable 

plays a crucial role in these ^^^^^ 

In the routine for ca 1 c^, )^ ^ i^in^^^^^et er values, the character of a 
syllnblo is coiisidered' to be ^^^^<y^ ^ numerous inXlli?i2£l_ ' vowels 
of the previous, current and ^^l^'^^ ^^ables,, the final consonants of the 

previous and current syllable^ ^^^^ iv^^^i t ia 1 'consonant s of the current and 
following syllables.. With each ^^Hce associated a set of target^ 



^^i t 

io." Fujimura ( 1976) Syllable of sp^^^.ch synthesis. Unpublished 

memorcTndum , Bell Laboratories 
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pcirnmeter Viijuos cind n curve U>Lit represents the extent: of the inlluence over 
t.irae. An Lntluence curve is a mocjified exponential function of the form 
k^Bt • Iri this 'function [sirnilar to the one u^ad by Lindblom (1963) "^n his 
well-known model of conson;mt-vowe.l, con r t i cu 1 a t ion ] , , the coefficient 
K determines the effective time of onset, of an influO?nce, and the 
exp^onent ^ determines its . rate of growth. On phonetic grounds, these proper- 
ties of -the function are appealing, since one would expect both the sh:ipe and 
the relative timing of the various influence curves to be significant 
variables. Of course, there are other functions that might possibly have 
been used instead. The value of the function is restricted to* the range 
0...1, since it is used as a weight, and at a certain time t{x) after the 
notional beginning of the syllable cycle, 3 becomes negative, so that the 
influence will begin to diminish. The target values, and the values 
of tc, 6, t(x), and other variables are assigned by the rules. 

It might be objected that the notion of an "influence*' simply reintro- 
duces the'phonetic segment in a new guise, particularly when I refer to the 
influences of consonants and vowels, and employ the conventional terms ..for 
manner classes. But unlike phonet.ic segments-; influences are not linearly 
ordered; their temporal relationship is more complex than that. And "conso- 
nant," "vowel," and the various manner class terms are to be understood not 
as segment, cat egor ies , but as labels given to various recognizable aspects of 
the syllabic cycle by which they are delined. 

-Because of our particular interest in the temporal patterns of events 
within the ^syllable, we^ have provided various ways t;o control these patterns 
in 'the program. 2 As we have just seen, K: controls the effective onset of an 
influence; by manipulating this var iab le , d if f erent degrees of consonantal 
and vocalic coarticulation may be provided. Since the moniernt when an 
influence begins to diminish is a variable , art iculatory holds for stopo and 
fricatives' can be represented. ^ Moreover, each influence can potentially 
increase the duration of the syllable by a certain amount. If such an 
increment is called for, the onsets of syl 1 ab 1 e--f inal and following-syllable 
influences are postpqned by appr .)pr iat: e 1 y reducing the ir < va.? ue s . ' • 

The actual param.eter values for a l par t icul ar 5-msec sample of speech are 

derived by an iterative calculation. The influences are regarded as ordered, 

from vowels to fricatives to stops. At each iteration, the value computed 
for a parameter is 

Vi = Vi_i + Ii(Ti - Vi_i) 

that is, the weighted sum of the target values associated with the influence 
and the value computed at the preceding, iteration, the relative weighting 
being determined by the value of the influence function at that point in. the 
syllable. (At the first iteration, the target value for the. vowel of the 
previous syllable: serves as the seed value Vq . ) Because of the large number 
cf influences, the burden of calculation would be considerable were it not 
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of syllable duration are reported in a paper' read .by 
earlier s e s s ion ' ( Sh ockey and Matt ingly, 1976) . 
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that at any one timo, rn.'iny potont:i.il infliUMicoM art* inactive, that in, hav(» 
vnluos noar zoro, and may simply bi^ ij>nortMl, . 

TIU? next throe slides (Fij^uros 2, 3, A) illustrate h6w tho overlapping 
of vfl/Hous influences is realized. This slide (Figure 1) shows an |a| 
assumc^d to have been preceded by [o] and followed Ijy [e j The curve with 
black circles in the upper portions o''f the slide shows the increasing 
influence of the [q] at the expense of the [o] of the preceding syllable. 
The curve with white circles shows the increasing influence of the (c ] of the 
following syllable at the expense of the fa). The iower portion of the slide 
(Figure 1) shows the target formant frequencies for all three vowels and the 
formant niovements resulting from their influence. 

In this slide (Figure 3) the influence of a final palatal glide is 
interposed, in addition to the other influences,, to give the diphthong [ai], 
and the formants change accordingly. 

Finally, in Figure 4, the influences of an initial [y] glide in the [ail 
syllable and of an initial [w] glide in the following syllable are superim- 
posed upon the other influences. 

This way of calculating parameter values will be recognized as a 
generalization of the method used by Holmes, Mattingly and Sheartne (1964) and 
by the earlier Haskins programs for calculating formant transitions (Matting- 
ly, 1968a, 1968b; Kuhn , 1973), in which the "boundary value" used as a basis 
for interpolation was the weighted sum of -the target frequencies of ,two 
adjacent phones. It is also analogous, as Tim Rand has .pointed out, to a 
series of filters, each of which corresponds to an "influence." 

The scheme, as described so far, is quite general, and could be 
implemented in terms of articulatory gestures, or vocal tract shapes, or 
formant movements, depending upon the choice of parameters. • The most 
interesting and satisfying implementation would be the articulatory one,. but 
because we are -anxious to explore temporal questions as soon as possible^ we 
are beginning with an acoustic version. 

<j ' ■ • 
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Articulatory MoveimMilH In VCV ScqurncrM 
Thonwi« Gny^ 



ABSTRACT 

V 

The purpose of this experiment was to study both the timing 
and positional properties of articulatory movements in VCV utter- 
ances. Convent ional c ine f 1 uorograph ic techniques were used to 
track the movements of the upptir li^, lower, lip, jaw, tonp,ue tip, 
and tongue body of two speakers who *read randomized lists of VCV 
utterances containing' ,t'he vowels /i,a,u/ and th» consonant's 
/p,t,k/, in all possible combinations. " Results showed that the 
timing of articulatory movements in a VCV sequence is, con^strained 
by the intervocalic consonant, even if the gesture for the conso- 
nant is not a contradictory one. Anticipatory movements toward the 
second vowel always begin during the closure period of the intervo- 
calic consonant. The appearance of carryover coart iculat ion ef- 
fects dejfends on^th'e phonotlc ident ity, of Xhe ' part icular segm^jnt or 
degree of involvement of the articulator. Carryover effects, like 
ant icipatdry. effects, did not extend beyond an imtnediately adjacent 
segment. These findings suggest 'that the rules governing -the 
segmental input to'^a speech string might be simpler than present 
models suggest. ^ 



INTRODUCI.ION 



The purpose of this' paper is to explore a number of questions related to 
the properties of articulatory movements in VCV utterances. The experiment 
was motivated by the fact that in the literature there exist contradictory 
reports concerning the nature and extent -of various coart iculatory iahenomena. 
While the traditional view, and " the earlier papers of Ohman ( 1966), and 
Daniloff and Moll (1968), for example, hold that coarticulat ion is inherent 
in the programming of speech sequences, and that its effectfcs can extend 
across variotfr. structural boundaries, other mbre recent studies (Gay, 197.4a, 
Gay, 1974b; Bell-Berti and Harris, 1975) suggest that the rules governing 
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coarticulat ion (both anticipatory and" carryover) might be somewhat simpler 
than previously believed. 

Anticipatory coart iculat ion effects are es sent ially ■ t iming' effects: 
movements toward some parts of a feature target of a given segment begin 
before others. I., a study of anticipatory lip rounding, Kozhevriikov and 
Chistovibh (1965) found that the onset of the rounding gesture far the vowel 
/u/ placed in a CCV syllable occurred at the beginning of the syllable. 
Daniioff and Moll (1968), 'in extending the observations of Kozhevnikov and 
Chistovich, showed that li-p rounding for /u/ can begin across as-many as four 
segments . ahead of the vowel. In their experiment, anticipation of lip 
rounding for the vowel /u/ was studied for a number of mono- and disyllabic 
single .and two-word utterances embedded in sentence frames using lateral view 
c inefluorography . Onset of lip rounding usually began during the closure 
phase of the first consonant in the sequence, and was riot affected by the 
position of. word or syl*],able boundaries within the^^sequence . Another type of 
anticipatory coart iculat ion was shown to exist by Ohraan . In a spectrograph ic 
study 'Crf coarticulat ion in VCV sequences, Ohman shoved that the variability 
observed in transition movements to the consonant could be predicted by the 
forrnant' frequencies of the second vowel. This led Ohman to conclude^ 
vowel-to-vowel movement in a VCV is essentially diphthongal with the c^'onso- 
nant simply superimposed on the basic gesture; in other words, movements 
toward the second vower^^begin independently from and at about ' the same time 
as those toward the consonant. In other studies, Moll and Daniioff (1971) 
showed that velopharyngeal opening for a nasal consonant can begin two. vowels 
in advance of the cons-onant, and McClean (1973) showed that in a CVVN 
sequence, velar opening for the final nasal, begins abead of the syllable 
boundary, unless the two vowels are separated by a marked junctural boundary. 
These studies, among others, suggest that articulatory encoding is a complex 
phenomenon whose eT^ects ca-n spread across several adjacent segments. Most 
studies support, either explicitly or implicitly, Henke * s (1966 )' ar t icul a t;ory 
model that proposes the operation' of a mechanism that scans future segmental 
inputs, or features thereof, and sends commands for the immedicite attainment 
of those feature targets that would not interfere with the attainment of 
immediately intervening articulations. 

However, in several recent st ud ies , . both electromypgraphic (Gay^ 1974b; 
Ushijima and Hirose, T975) and acoustic (Ohde and Sharf, 1974; Bell-Berti and 
Harris, 1975), evidence was used to argue against the ubiquity of anticipato-. 
ry coarticulation effects in speech. In an experiment by Gay (1^74b), EMG 
recordings were obtained from -the genioglossus and orbicularis oiy^is muscles 
of two subjects during ithe production of vari^ous VCV syllables/. In thosi*^ 
utterances where the genioglossus muscle was involved in . the pi^oduction of 
both the" first and second vowels (as in 7upi/ or /itu/), or wh^^re the first 
and s^jcdnd vowels were the same (as in /ipi/ or /utu/), a ;cg!Ssation of 
act ivity occurred for t'he . genioglossus muscle 'during the t ira*^ of consonant, 
production. In other words, each vowel in the sequence (even /in a syiijmetri- 
cal VCV) was marked by a separate muscle pulsed The interpretation of the 
finding re fleeted a discontinuity in vowel-to-vowel movement , and thus , a 
contradiction to .Ohman's <1966) diphthongal movement hypothesis. Another 
finding of this experiment was the presence of a trough in the orbicularis 
ori s envelope during the product ion of an alveolar or velar consonant that 
separated two rounded vowels. This finding was not consistent with others 



that showed a considerably earlier onset of the lip rounding gesture 
(Kozhevnikov and Chistovich, 1965; Daniloif and Moll, 1968; Benguerel and 
Cowan, 1974). In another EMG experiment, Ushijima and Hirose (1975) Swowed 
that in a CVVN sequence, lowering of the velum in anticipation of the final 
nasal was restricted by the syllable boundary. While these results were 
obtained from Japanese, they nonetheless argue against a general moael of 
anticipatory velar lowering. ' . , 

* In an experiment pertormed by Bell-Berti and Harris (1975), spectro- 
graphic measurements were made from eighteen utterance types that consLSted 
of the vowels /i,a,u/ in CVC combinations with the consonants /p,t,k/. The 
data showed that the effects of the 'terminal consonant, on the midpoint of the 
stressed vowel were not nearly as^large as those of the initial consonant; in 
other words, the carryover effect; of the initial consonant on the vowel is 
considerably greater than the anticipatory effect of the second consonant. 
The, same results were also obtained Independently by Ohde and Sharf (1974): 
in 'a variety of CVC s equenc e s , : c arryover articulation effects on vowel 
targets'were likewise greatCL than anticipatory effects. 

Carryover coar t icula t ion effects'are essentially positional effects and 
exist in the form of variability in target (or target feature) positions as a 
function of changes in phonetic context. Carryover effects have traditional- 
ly been attributed to mechanical or inertial effects and, in general, have 
been studied less extensively than anticipatory effects. Al though:- car ryo /er 
effects have been shown to exist at both the EMG and articulatory levels 
(MacNeilage and DeClerk, 1969; Sussman, MacNeilage, and Hanson, 1973; Gay, 
1974c), the pervasiveness of these effects is somewhat in doubt. In a study 
otv the production of thirty-six CVC monosyllables, MacNeilage and DeClerk 
(1969) .found that some aspect of the production of every phone was always 
■influenced by a preceding phone and almost . always influenced by a following 
'phone. In particular, the size of the EMG signal^ would be different 
depending on the identity of the^ad j acent vowe 1 or consonant. In countering 
thje argument that a motor connnand representation of the phone shows less 
variability than an articulatory target representation, MacNeilage (1970) 
later proposed that the observed EMG variability reflected a complex motor 
strategy, the underlying goal of which is a relatively invariant articulatory 
end. The concept of an art iculatory-based target system as proposed by 
MacNeilage was further supported, at least for vowels,/ by the cinefluoro- 
graphic data of Gay, Ushij ima , Hirose , and Cooper (1974) and Gay (1974a). In 
the latter study, lateral view x-ray mot ion pictures were obtait^ed from two 
speakers who produced the vowels /i,a,u/ in ' a variety of VCV contexts. The 
results ■ af this ..exper imenC showed that for both subjects, the target 
positions for both /i/ and /u/ , in both pre- and post consonant a 1 pps it ions , 
remained quite stable (within 2-3'mm) across changes in the consonant and 
transconsonahtal vowel. . Finally, a careful examination of Ohman ' s (1966) 
acoustic data shows that carryover effects of the first vowel or .the 
intervocalic consonant on the formant frequencies of the second vowel were 
virtually nonexistent: fdrmant frequencies fell within a 50-60 Hz range 
regardless of the identity of the preceding phones. However, in contrast to 
t^h'e studies cited above, carryover effects have been shown to exist at 'the 
articulatory level.. Sussman, MacNe ila^Tj aTRl^ Hanson' ( 1973) and Gay .( 1 974c ) , 
for example, have produced data showing jaw position during consonant and 
vowel production to be sensitive to the degree of jaw opening_oJ_an— ad-jaeent 



phone. Thus," altho.ugh evidence exists to support an ar t icu la t ory " target 
f ormui at ion'i no present thepty specifies the "rules governing failure to 
achieve a particular target, '. 

The divergent research result^i of the iast- ten years, whe t her " ar i s ing 
from differences in interpretation or the utilization of different experimen- 
. t al- . technique s , nevertheless serve to point out that a rflimber of important 
questions concerning the dynamic properties of speech gestures remain un- 
answered., In this experiment, both the timing and positional properties of 
articulatory movements in VCV utterances were studied, using conventional 
pellet tracking and. s pec tr ograph i c techniques, in an attempt to provide 
answers to some of these questions. ' The format of the experiment was 
designed to explore qu^estions related to two particular issues: 1) the 
constraints an intervening consonant might place on the movements of the 
articulators^,^ especially the tongue body, from one vowel to another (is the 
movement from vowel to vowel essentially diphthongal or is it locked somehow 
to the int^ervocal ic consonant? ) and 2) the 'extent of carryover coart iculat ion 
effects throughout the syllables (are. such effects limited to phonetically 
unmarked features such as jaw position or do they extend to other properties 
of both vowel and consonant production?). 

■ ' METHOD 

Subject s and Speech Ma t er i al ' ^ ' • 

Subjects were two adult males, both native speakers of American English. 
The speech material consisted of CVCVC strings where, the initial and final 
consonants remained constant ( /k/ and /p/, respectively), and the medial VCV 
S'equences contained the vowels '/i,a,u/ and t;he consonants /p,t,k/ in all 
possible combinations. Each of the twenty- seven utterances was placed in the 
carrier phrase, "Say again/' and random-ordered into a master list. 

Data Record ing 

Lateral view x-r ay ' f i 1ms were recorded with a 16 n;m cine camera at a 
speed of -60 fps. The x-ray generator delivered I nisec pulses at 120 kv to a 
nine-inch image intens?.fiQr tube. For purposes of tracking articulatory 
movements^ 2.5 mm lead pellets were attached to the upper and lower lips, 
tongue tip, dorsum, and body (at two locations) of both subjects. ^ In 
addition, a- reference pellet was attJiched - at .the embrasure of the up)per 
central incisors. Jaw movements for both subjects were tracked by measuring 
the 'distance between the tip of the lower central incisors and the reference 
pellet. All pellets were attached at the midline using a cyanoaciry la t e 
adhesive. The' locat ions of 'the pellets are shown for both subjects in Figure 
1- . .. .. . . • ' : 

Each subject was positioned- in a head holder. The subjects were 
instructed to read the list at a comfortable speaking rate and. with equal 



^The sec ond , ' ^more posterior, tongue body pellet for. Subject GNS fell off 
during" the experiment " run . 
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SUBJECT FSC 



SUBJECT GNS 





:el: locations of pellets for tracking , articulator), .ovenents.. Ja., 
.oveicnts were .easured at tiplof lower central incisors, 



.. . . ■ • • ) 

stress placed on the. two syllables. .A brief practice session preceded each 
run. . During the x-ray run, the corresponding acoustic signal was also 
recorded on magnetic tape. 

• ■ ' / 

Pat a Ana lys is \> 

A semiautoma t ed system for analyzing the x-ray data was developed for 
this purpose. It consists e s sent ia 1 1 y o f a 16 mm film analyzer (Percepto- 
scope, Mark ITI) and d ig it iz iug'' t ab 1 e t ( Summagraphics ) that;, is interfaced to 
a small laboratory comput er ( D . E . C . , PDP/8E). The film image is projected, 
fr'ame-by-franie , via an overhead mirror system onto the surface of the 
digitizing tablet. The position coordinates of each pellet (or other 
anatomical landmark) are stored in the computer when a hand-held pen is. 
depressed over the pellec location. Sections of the tablet outside the image 
■area are used for control operations, for example, storing a special skip 
code or indicating end of utterance. The computer measures the X and Y 
coordinate positions of each pellet relative to the position of the reference 
pellet. and stores the accumulated data, f rame-by-f r arae-b'y~u t terance , on disk. 
A second program is used to display the X and Y components separately as a 
movement track, on a large display scope. The resolution of the digitizing 
tablet is .25 mm. By projecting the film twice real size, measurement error 
is easily reduced to within _+ 1 mm. This was the usual maximum real size 
error obtained from repetitive measurements of selected samples. 

One particular problem inherent in x-ray pellet tracking techniques is 
the obstacle dental fillings present in marking pellet locations. Because of 
the densi-liy of amalgams, the pe lie ts.j become lost when they enter 'behind such 
fillings. Dental restorations interefered with the tracking of the first 
tongue body pellet of Subject FSC and the tongue body and tongue tip pellets 
of Subject GNS, both to varying degrees in different utterances. 

- Wide band spectrograms, using a Haskins Laboratories digital spectro- 
graph routine, were made fpr all utterances. A particular advantage of this- 
routine is a software thresholding feature that can be used to reduce the 
background noise produced by the x-ray generator . This permitted spectro- 
graphic measurements to be made for almost all\of the vowel nuclei, although 
the less intense parts of the signal assoc iated with formant transitions were 
lost in the noise. . ' 

The acoustic recordings of both subjects were analyzed for the purpose 
of determining whether stress differences appeared for the first and second 
vowels. Perceived destressing occurred consistently for /a/ in preconsonan- 
tal position for Subject GNS. " Destressing of ' preconsonantal /a/ was also 
evident \.in the spe c t rograph ic measures. First and second formant frequencies' 
for /a/, pooled across consonants and vowels, were 640 Hz and^ 1340 Hz . for the 
initial position, and 810 Hz and 1210 Hz, for the final position. Instances 
of first vowel destressing for /a/ also occurred for Subject FSC, but not 
cons is t en't ly . These were the only stress effects that appefired for either 
sub jec t . • 
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The Timing ojf Art iculatory Mov^ y-A^ 

•One of the basic que s t io^^ " ^^/jts in tnig experiment concerns the 

coordination of articulatory m^Ij^^V of ^oughout a VCV utterance, that is, 
the relative timing of the mov^^^J^t^ e^^^^^ tongU^ body in relation to those 
of the lips, jaw, and tongue ^l^^^^Uy ^^^^^^ the production of the 

intervocalic consonant. The tP^^V the^^^^^ consonants appearing in the 
various utterances ' were sel^Ect^^ basis of tht- varying degrees of 

. involvement of the tongue duil^^ ^h^p to^^^^^^°^' ^^^^P^^^e independence as 
a primary articulator for /p/\ ^<^^^y ^ ^Sue • t i-P involvement for /t/, and 
complete involvement of the tor^gV^ ^ primary articulator for /k/ . As 

will be shown, however, tong^^ H^^^ ^^it^^^^^^ ^re either involved in or 
constrained by each of the thr^^^U . intervocalic consonants. 

Measurements of the relat:i^^ V articulatory movements in the 

various VCV sequences are sut^^^^i^^ j figure 2. ...This figure shows the 
ranges of onset times of tongu^ t::^^' ^nd p^^.^^ary articulator movements 

(either the lower lip, tongue ^^H^ . body for /p,t,k/ respectively), 

from the first vowel to the in^^{^^^^^^^ *^^nsonant > and from the intervocalic 
consonant to the" second vowe^^ , i^/^lo^^^inies relative to the time of 

closure for the con sonant and ji^^ ^^l ^ sepa^^^tely for the three conso- 

nants. These .^data provide ^^^.C^ ^^^^ure of the relative timing of 

articulatory movements- through t:!^^- .^nce. 

. For both subjects, the ti^j^^ ^ c> ^cula tory movement s from the first 
vowel to the consonant .-..^^re. f^^, "^o/^w^ ^ ^t^-gined' than articulatory movements 
from the consonant to the sec^^/ ' ^or dosing movements, the onset 

times of tongue body, ^jaw,' an^ ^ t i^^^Xa tor movement s fell within the 

same overall time window. Vln^ ^ e ^ch ^ ^indow, itself,, is rather wide, 
coordination within the- window ^ ^^ti°^^' constrained, with the movements 
of the tongue body, jaw, and pt^fff^^^/j, Vg^*?^lator beginning within 10-15 msec 
of each other. The observed o^^^^^^^^^ ^^bility CoU^l.d not be,,/at t ribut ed to 
either the duration pf consonar^^ ^l^? ^^^tli^ the '^^fntity of the first^ vowel, 
although there was some tendency '^^(^^^vix starting times to occur for /a/, 
pjobably as a function of great^^^ d^^^^^^'^'^y ^^Pl^cement . It should also 
be noted that in a number of ')P^^^^^- notably those Sequences where the 

first vowel is /u/ , closing m^V^^^ .^^^ P?^^ary articul,ator were not 

' accompanied by corresponding mo^^^^f^ ^ither tdngue body or jaw. 2 

In contrast to the- constr^jf)^^ ^°^^nt^ inovetn^nts from the first vowel to 
the consonant., opening from th^^^^' ^o the Second vowel i^^s character- 



^When the intervocalic consonar^^ /p/ Z^/, tongue bpdy partici- 

.__pat3on in the consonant gestur^ ^f^i the^^entity of t;he first vowel; 

tongue body movements always ^^^^^^ "^^^we ^^"^"^^^^ ^rt iculator movement s when 
the" first vowel was /a/, som^^}^^^^ ^movement when the first vowel- was 

/i/, and, never showed -movemeni^ ^^^^^ . ^ov .^^ ^'"^ ^ ^o^el was /^Z . For /k/ , of 
course, the tongue body always .^^^^ ^^tj ^^^nt i^to the consonant. • In those 
cases where tongue body movej^ ^ s ^rj^^^ ^^^t app^^>^ for the consonant, the 
tongue body simply maintained C ^^Ositio^ of the first vowel. 
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Figure 2: Ranges of relative onset times of articulator^ roovement, associated 
■with consonant closing and vowel opening. ■ liie vertical lines 
indicate mean values. 



ized by a staggered pattern of movements. For both subjects, opening toward 
the second :^owel . began with the tongue body, and was folloyed by the jaw and 
primary articulator, in that order. Movements of the tongue body began 
anywhere from 5-50 msec for Subject FSC, and 5-60 msec for Subject GNS after- 
the time of consonant closure. All tongue body movements, however, were 
underway before the time of consonant release. The onset time of jaw. opening 
also varied within the interval of consonant closure-, but usually followed 
tongue movements and preceded primary iirticulator movements. TheWariabil ity 
of opening ons et t imes , 1 ike those . for closing, did not correspond to any 
feature other than a tendency for earlier opening to occur for a following 
open vowe 1 . 

The dynamic properties of articulatory movements in a VCV sequence and 
the rules that govern those movements, wilT be discussed for each consonant 
category using graphical illustrations produced from the f raifie-by-f rame 
measurements of the x-ray films. The movements of the tongue body, lips, and 
jaw for a VCV sequence where the intervocalic consonant is /p/, are 
illustrat-ed for both subjects in Figure 3. This figure shows the movement 
track of the height dimension for the sequence " /ipa/ . Each track was graphed 
from discrete points measured every film frame, that is, at approx^im^t^ly 1 7 
msec in^t;:ervals . Measurements begin during the closure period of ,t:i;ie initial, 
/k/ and^'end at the time of closure for the final /p/ ; 0 on thei . abscissa 
corresp^^ncis to the time of consonant closure. This figure illustrates' the 
constraints that '^the intervocalic consonant places on the timing. O;^ t:he 
tongue body from vowel to vowel. The movement of the tongue body from .t,be 
finst vpwel to the "second vowel does;not' begin unt.il after closure for the 
intervocalic consonant is completed.. This, of course, was a salient feature 
in" the production of all VCV utterances by,, both subjects ( re f . Figure, 2-0 . 
This^figure also shows 'that the movements of the tongue body begin ahead of 
those for the jaw. The delay time is approximately 40 msec for Subject FSC 
and 60 msec for Subject GNS. This delay Suggests that tongue body movements 
tow:ar^ the vowel are probably independent -from jaw movements toward the 
vowel. This figure also illustrates ''the variability of j aw . movement s 
associated with consonant production. For Subject FSC, jaw closing begins at 
the t'ime of lip closing, while jaw opening precedes lip opening. For Subject 
GNS, on the other hand, jaw closing' does not accompany lip closing and jaw 
opening follows, lip. opening (this pattern: is the' only excepL ion to the 
general . rule) . As is also evident in this figure, upper Up contributions to 
lip closure were negligible for both, subjects. Finally, Subject FsC showed^a 
pattern of lip closure that was often characterized by continued compression 
throughout the closure period. 

Cdnsonatit constraints on vowel-to-vowel movements are as evident in the 
front-back dimension as in the height dimension. Figure 4 shows tongue 
movement in the X dimension plotted against the same baseline as lower lip 
movement . in the^ Y d imenisipn^' as"-va .fiinction of time for the sequence 
/ipu/ . 'Again, 'it is appaVetit;-- t«hat tongue .movement toward the' second vow^l 
does not begin until after' consonant closure. The data for Subject GNS also 
show- what might be a tongue,,body gesture associated with the consonant. Such 
a . gesture, however, did not . appear regularly in the data, .nor did the tongue 
-body appear to reach a specific, repeatable target position .when , such a 
gesture did appear. 
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gure 3: Movement.- tracks for utterance /ipa/. 0 on the abscissa, in this 
•and. all subsequent figures, corresponds, to time of consonant' 
• closure; vertical bars indicate the times of consonant closure and 
consonant release. The tongue body pellet for Subject FSC is the 
/ second, more posterior, one. 
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Fi'gure 4: Movement tracks for utterance /ipu/. Both height and fronting' 
• ' measurements are plotted on the same baseline. 



The samcj, rules for tongue body moveijont associated! with /p/ are also 
evident for utterances where /t/ is the intervocalic consonant (Figure 5). 
Here, as before, movement's coward the second vowel do noti begin until after 
closure for /t/. Also, this figure shows that the movements of the tongue 
body, tonf>ue tip, and jaw are again, independent from eacVi other; they all 
begin m. :ng into the second vowel at different times, u[\h the tongue body 
leading, the jaw and tongue tip, in that order. 

Perhaps the best illustration of consonantal constraints on tongue body 
movements is one where the first and second vowels of the iutterance are the' 
same. Figure 6 shows the "movement tracks for the jaw and four tongue pellets 
during rhe production of /iti/ for Subject FSC . Instead of the tongue 
maintaining the /i/ target during the consonant , '^the tongue blade and both 
tongue body pellets show movement throughout the consonant gesture. The 
blade ,and anterior tongue body pellet appear to shadow movements 'of\ the tip, 
while the posterior' tongue bodyo. pellet, moves in the opposite d^irection 
•(lower). Because the tongue body is displaced at least 5 mm from the vowel 
target during the time of consonant production, the movement is' probably not 
passive (a pressure perturbation for example). Rather, it would seem that 
the gesture is a facilitory one or one that reflects a strategy to modulate 
the degree of aspirat ion that might otherwise ^occur if the postalveolar 
channel were too constricted.^ It sht)uld also'be noted that the present 
finding agrees with the x-ray data of Kent ( 1970 ) that also showed tongue 
body movement in a symmetrical VCV at the time of consonant production. 

The most interesting tongue movements are those associated with /k/ 
•production. Figure 7 shows both the height and fronting components of tongue 
body mo^^ement during the production of /aki/> /aka/, and /aku/, for Subject 
FSC. These traces show that the tongue body is in continuous movement 
throughout the closure phase of the consonant. From the time, of /k/ closure, 
..'the tongue' body continues to move upward and forward for a following /i/ or 
/a/, and upward and si ight ly backward for a following /u/. Continuous 
movement of the tongue body during /k/ production has also been reported in a 
number of other papers. The data of both Kent ( 1970) and Pe rke 1 1 ( 1969 )* show 
elliptical patterns of movement of the tongue body for /k/ in symmetrical 
../VkV/ and /akV/ sequences^ respectively. A similar .pattern exists in the 
present .<^ymmet r ical /VkV/ sequences and would emerge from the /aka/ data in 
Figure 7 if a composite trace were constructed from the two movement tracks.. 
The present da t a . ar e " a 1 so in general agreement with tho^se of Houde. (1967 ) who 
showed that the tongue • body was in continuous movement during /k/ in an 
asymmetrical /VkV/ sequence. 

\ ' ' 

Of par-ticular Interest in the present data is th2 finding that, 
irrespective of the identity of che second vowel in the sequence, closure for 
/k/ occurs at approximately the same legation ia tW^ vocal tract. Tongue 
movement continues -through the. consonant , with r e 1 ease ^pccur r i ng at different 
locations in anticipation of the following' vowel ( re f . " J^igure 7). While the 
three movement tracks are within 3 mm of each other, in^both dimensions, at 
closure, they diverge towards release, at which point th^^ d i f f e rence s are"8 
mm between /i/ and Va/ in the height dimension, and 10 mm Between /i/ and /u/ 



K. N. Stevens: person.-il communication 



n 

CD- 

c 

m 



+2 



E ^' 

u 

REF 

I-, 

liJ 

I -2 



Subject FSC 



. y 1-., 



UJ 



3 



UJ 



4 

-S 



Tongue Tip 
——Jaw 

Tongue Body 



III MMMHI 




-200 -100 0 100 200 300 



+2 
+1 

REF 
-1 

-2 

-3 
-4 
-5 



Subject GNS 



liiii iik»Miiiliii|i 




-100 0 100 200 300 



DURATION (msec) 



Figure Maveiient. tracks for utterance /ita/. The tongue body 'pellet for 
Subject FSC is the second, more posterior, one. 



. ..I 



J +2 
Z +1 

g 

to 

> -2 

5 -3 

I 

UJ 



Jaw 

Tongue Tip 



Tongue Blade 

Tongue Body(l) 

" -Tongue Body (2) 



x4rvmrrrvn 




X 



200 -100 0 100 
DURATION (msec) 



200 



Figure 6: Movement t racks (h eigh t ) for utterance /iti/, Subject FSC, 



134 



ERIC 



Q 
C 

m 



— aki 
aka 
•aku 



Vlllllllll 



£ 
Z 

Oh 

0 !i! 

0. I 

ol 

w Z 



REF 

1 

-2 
■3 




-6- 

-8- 










.1 



J L L— ,i. 



J— 1 



-300 -200 -100 0 100 200 300 
, DURATION (msec) 



H 

W 



Figure 7; Movement tracks (height and fronting) for the second tong-je boiiy 
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in tlio Iron ting dimension. Thus, consistent with the data for both /'p/ nnd 
/t/ ^ the data for /k/ sliow an t i c i p.'i t o ry movements to bo .locked to the closure 
phase of tlie consonant. 

For VCV uttercinces containing either /p/, /t/, or /k/ as the i nt o'rvoc a 1- 
Lc consonant, the usual sequence of 'art iculatory events is as follows. 
Movements of. the jaw, tongue body, and primary articulator begin at about the 
same time,, with jaw closing continuing past the time of occlusion for the 
consonant. Shortly after closure for the consonant occurs, tongue body 
moveme nt to ward the second vowe 1 begins. This movement is foil owed i nd epeh- 
. dently by jaw opening and release of the consonant. Artioulatory movements 
for the postvocalic vowel always begin between the time of consonant closure 
.andconsonantrelease. 

The data of this experiment, in showing consonant constraints on vowel 
movement in a VCV utterance, are not consistent with Ohman * s (1966) hypo- 
thesis that vowe 1 ~t o-v owe 1 movement in a VCV sequence is essentially diph- 
thongal. Ohman's hypothesis is based on the assumption that tongue body 
movements toward the second vowel begin at about the time of onse-t of cl-osing 
for the' consonant. However, jthe present data show that' movement toward the 
second vowel begins much later, some 5-60 msec after closure for the 
consonant has already been completed. This pattern of movensen t even occurs 
for /VpV/ sequences, where the tongue body is not actively involved in the 
production of the intervocalic consonant. These data suggest that either the 
tongue body itself attains a target during consonant production, or more 
likely, that the release ofthe consonant and the movement toward the vowel 
are linked in a basic gesture^. . 

In addition to questions concerning anticipatory movements of the tongue 
body, It was expected that the data of this experiment could be used to track 
the onset of lip rounding for a rounded vowel preceded by. a : variety of 
different phones. La tera_L. view x-rays can provide an indication of lip 
rounding in the form of 'degree of lip protrusion. Un for tuna t e 1 y ,* however , 
this measure was not a very sensitive one for the two speakers used in this 
experiment. The difference in protrusion between the spread vowel /i/ and 
the rounded vowel /u/ averaged only 5 mm for both speakers. • It might be 
noted though, that in no case did evidence of a protruding gesture appear for 
the rounded second vowel in any of the VCV utterances until after closing for 
the int 0 rvoc .-^ M c consonant was completed. 

To siiTT'' ' the .data thus far: the relative timing of articulatory 

movements in i '.'l^V sequence is affected by the intervocalic consonant, even 
if the gestur- :'')r the consonant is not a contradictory one. The intervocal^ 
ic consonant ni'fects both tongue body and jaw movements toward the second 
vowel. Anticipatory moveme,nts toward the second vowel always begin during 
the closure' periofj of the intervocalic consonant, suggesting that the CV 
component of the VCV sequence mjght be organized as a "basic unit. 

The Attainment, o t ^ Ar t ic u 1 a t ory Targets 

Ca rryove r c oa r t i c u 1 a t ion f fee t ^ were s't ud i ed in relation to both the 
influence t. lie lirsL vow.*] -.'X-M'ts nn the position of the intervocalic 
consonant and the influence* thr- intervocalic c'^nsonent exerts on the attain- 
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ment of the tar-got for the second vowel. 
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In contrast to timing measurements, useful positional measurements for 
/p/ could not be obtained. The important positional information for / p/ 
appears primarily in the coronal . plane ; lateral view x-rays simply do not 
reveal- this information. However, the present data do show a rather strong 
vowel effect on jaw position during /p/. Figure 8 illustrates this effect 
for both vsubject s. These plots, which agree with the data of Sussman,. 
MacNeilage, and Hanson (1973) and Gay (1974c) show that the position of the 
jaw during the production of /p/ is sensitive to the openness of the adjacent 
vowe\l : greater jaw opening for the consonant occurred with a more open 
adjacent vowel. This figure also shows what is presumed^ to be a stress 
effect in the data of Subject GNS. Jaw opening (and consequently tongue 
height) for /a/ is reduced in the preconsonant al position. 

Carryover effects of the first vowel on the positional properties of /t/ 
did not appear in either the tongue tip or jaw measurements. Figure 9 
illustrates the insens it iv it y of tongue tip position for /t/ to different 
preceding vowels, -in both the height and fronting dimensions. It is apparent^ 
that neither the ret rus iveness of /u/ nor the op*" ness of /a/ had any 
measurable effect on .the /t/ target, in either dimensions. The only 
differences in the three traces appear in the timing of the closing 
movements. Since the onset of closing is earliest for /a/ and laitest for 
/u/, the differences' are presumed to be displacement related. Finally, jaw 
movement^ for /f/, unlike those for /p/, were not affected by the openness of 
the preceding or following vowel. 

The most interesting and extensive carryover effects of the first vowel 
on consonant production appeared in the movement , t rack of the tongue body 
during /k/ production! This is illustrated in Figure 10 for the VCV sequence 
where /i/ is the common second vowel. Here the predicted effect of different 
first vowels is evident. At the time of closure for /k/ , the tongue body^ is 
higher and more fronted for /i/, and progressively lower 'and more retruded 
"for /u/ and /a/. The magnitude of these effects is on the order of 7 mm 
between /i/ and /a/ in the height dimension., and 5 mm. betwee.n /i/ and /a/^in 
the fronting dimension. The most interesting feature of this graph, howeve.r , 
is thai: the carryover effects of the first vowel do not extend far into 
consonantal * closure . On the, contrary, the three curves converge before 
consonant release at about the time ir.ovement begins toward the second ypyel . 
The relative invariance of the movement from consonant release towarcS the 
second vowel further strengthens the- suggestion that the CV transition is 
produced as an integral unit. 

Carryover effects, of the first vowel on the production of the intervo- 
calic consonant . were variable: they could not be adequately measured for 
/p/, they did not appear for /t/, but did appear, in a predictable way, for 
/k/. The jaw effect evident for /p/ is apparently due, to the secondary 
importance of jaw closure in bringing about lip closure for /p/ . Although 
closure for /p/ can have both lower lip and jaw components, the jaw component 
is' probably facilitory and, as such, sensitive to phone t ic .' env ironment . 
Likewise, the difference in effects for /t/ and /k/ is presumably related to 
the , d i f f erences in degree of involvement of the tongue body during the 
production of the two consonants. 
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Figures: Movement tracks of jaw opening for /ipi/, /apa/, /upu/, both 
subjects, 




Figure 9: Movement tracks for rongue tip position, Subjecr FSC, 
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Carrvov.er cffocts of tho i nt ervoCc'i 1 ic consonnnt on the following vowel 
appoared only for the open vowel /a/, and were reflected in differences in 
jaw and, c onsequent 1 y » tonpuo body height. These effects, that are consis- 
tent with »:hoso reported by Gay (1974a), are illustrated in Figure 11. This 
figure shows the differences in tongue body and jaw height for the vowel /a/ 
when the. intervocalic consonant varies from /p/^to /t/. Opening for the 
vowel is greater when the • intervocal ic consonant is / p/ , as opposed to /t/. 
The difference in tongue body height for the- first vowel is probably due to 
differences in stress between the two utterances. How.ever , this was not 
apparent when I'istening to the tapes. This figure also shows what appears to 
be tongue body involvement during the production of /t,/. The movement track 
for the tongue body sho.^s greater elevation than that for the jaw during the 
ime of consonant production. Hi is. means that, the tongue body position 
during consonant production is not simply being carried passively by the jaw, 
but rather has an active muscle component underlying it as well. Althou^,h- 
variability in tongue bo^y and jaw opening appeared in the articulatory data 
for both subjects, similar variability was not reflected in the corresponding 
acoustic measures. Apparently, the differences in jaw position as measured 
anteriorly at the incisors either do not correspond to the size of the 
pharyngeal constriction for /a/^, or are much less when thi^ arc of rotation is 
measured closer to the hinger axis of the jaw. 

Carryover effects of a preceding consonant on the production of the 
vowels /i/ and /u/ were small. These effects are summarized in Figure 12 and 
Table ) The figure shows the relative positions of the upper lip, lower' 
lip, jaw, and tongue body at the time th? tongue body reached its target 
(point of maximum displacement) for each of ^nine utterances containing the 
vowel /i/ in final position. Table 1 shows the corresponding values of the 
first and second formant frequencies at that point in time. 



TABL?: 1: First and second formant frequency values (Hz) for the vowel HI in 
nine different VCV utterances. Each utterance number corresponds 
to that of Figure 1 2 . 



Utterance 



I . 


ipi 


2 ." 


api 


3. 


upi 


4 . 


it i 


J . 


at i 


6 . 


ut i 


7 . 


iki 


8 . 


ak i 


0 , 


uk i 



Subject FSC 


Sub j ec t 


GNS 


^^1 


F2 




F2 


340 


M. 2030 


310 


2230 


360 


320 


2250 


360 


2220 


300 


2160 


360 


2220 


330 


2200 


320 


2120 


340 


2210 


350 


1990 


320 


2120 


320 


2210 


320 


2270 


3 60 


2160 


320 


2160 


350 


2190 . 


320 


2250 



141 



\ 
\ 



c 3 

5 o 
^ z 



to 

O 
a. 

UJ 

> 

< ^ 

on 



REF,- 
-1 

-2 
-3 



REF 
-1 

-2 
-3 



apa 
■ata 





-200 -100 0 100 200 
DURATION (msec) 



300 



Figure 1 1 



142 



Movnrnent trcicks of tongue body, (pellet 2) and j.^w height for /apa/ 
and /ata/, Subject FSC, , . ^ 
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Figure 12: Coordinate positions of upper lip,. lower lip, jaw, and tongue body 
for the target positions, of the vowel HI, both subjects. Each 
' utterance number corresponds tc ''Jie utterance in Table I. 



As is evident in the figure, variability of tongue body target positions 
is minimal (2 is mm for Subject FSC an^ 3 mm, for Subject GNS). Lower lip and 
jaw positions, on the other -hand, vary within a larger range, approximately 5 
mm for Subject and 1,0 mm fox Subject', GNS. Interestingly, lower lip and 

jaw targets seem t>>Cvary independent ly . from-'tongue body positions, but covary 
for both subjects. \his finding contradicts that of Hughes and Abbs (1976), 
who- showed that mouth \pe n ing' for /i/ remained relatively constant because of 
trade-offs between lowet lip and jaw displacements. This type of equivalence' 
wa s not evident in the present data, for either /i/ or /t/. Differences 
between the two sets of data might be attributable to differences in either 
or both the speech material and instrumental methods used in the' two 
— eitptri"ime-fHrS-7 — 

Hie acoustic measurements of target formant frequencies showed some 
'fariabil ity among the nine . ut terances (Table 1). First formant fre^juencies 
were within a range of 40\ Hz for both subjects, while ' second formant 
frequencies fell within a raVige of 230 Hz for 'Subject FSC, and 120 Hz for 
Subject GNS. Tlie measured acoVistic variability did not vippear to correspond 
..-to -.any observed .art icul a t ory variability. For example, utterances 2 and 7 
for Subject FSC w^re characterized by similar artiruiatory target points, but 
quite different formant f requenc ies'{ Conversely, utterances 3 and 4, and i 
and 9, were characterized by virtually the same formant frequencies, but 
different articulatory target points. Either the variability observed fell, 
for the, most part, within the range of measurement error, or more likely, a 
four-point parameterization tracking procedure of the type used in this 
experiment is simply inadequate foiT che purpose of relating differences in 
articulatory target points to the Acoustic output. It might also be noted 
that acoustic variability for both /u/'and /a/ were, in terms of percentage., 
within the same range as ■ var i ab i 1 it y f or /i/. 

Carryover *e f feet s , then, when they do appear, are unlike anticipatory 
effects in that they depend on the phonetic identity of the particular 
segment. Like anticipatory effects, however, carryover e f f ec ts, _seem' to 
spread no farther than th e ne ighbor ing "phone. These findings support an 
articulatory based formulation of speech production (MacNeilage, 1970). For 
the most part, an articulatory target corresponded as a relatively invariant 
representation of a phoneme. Articulatory var i ab i 1 i t y , . when it did occur, 
did so only under special circumstances. First, carryover effects for a 
consonant are reflected mainly in variability of jaw position, and only when 
the jaw is not primarily involved in the production of the phone, as in /p/. 
However, when the jaw is more tightly involved in the productioii of a phone 
(/t/ for example), degree of jaw opening was not. sensitive to that of the 
adjacent phone. The only other strong carryove;r effect appeared in tongue 
body movements for fntervocal ic ■ /k/. Here, unlike variability in jaw 
\open ing , c arryover effexts on tongue body movements do not seem to be either 
random in appearance or inertial in origin; Unlike ./VpV/,and /VtfV/ sequences' 
where the tongue body is usually in a waiting position before it moves toward 
the second vowel during consonant closure in a /VkV/ sequence, the tongue 
body is involved as a primary articulator in the production of the consonant. 
The movements nf the tongue body through /k/ (Figure 10) seem to be directed, 
in a straight-line fashion, to a common target position for^ release of the 
consonant. The data for Ikl provi'd'^ a "fairly convincing illustration of the 
limited spreading effect^ of coarticulation in a VCV sequence.. Because of 
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'Figure 12: Coordinate positions of upper lip, lower lip, jaw, and tongue body 
for the target' positions of the' vowel /i/, both subjects. Each 
. utterance number corresponds to the utterance in Table I. ^ 



As is evideiiC in the figure, variability o' ongue body target positions 
is minimal (2.3 mm for. 'Sub j.?: L FSC and, 3 imi for Subject GNS), Lower lip and 
jaw positions/ on the-Qtiier h^na./.; vary within a larger, range, approximately 5 
mm for Subject .7SC and./lO mm ^fgr' Subject GNS. Interestingly, lower lip and 
jaw targets seem to vary independently from tongue body positions, but covary 
for both sv.b.}ects. This fiudin^: con L rad ic^ y that of Hughes and Abbs ( 1976), 
who'sh.ow-Jo ':hat mouch opening for /i/ remained relatively constant because of 
trdde-o' is oetween 1 ewer ' 1 1 p' and jaw displacements. This type of equivalence 
was not evident in the present ^ata for either /i/ or /t/. Differences 
betvceu the two set^ of data might be attributable to differences in either 
or botn the speech, materia, and instrumental methods used in the two 
expeTlment s . 

The acoustic measurements of tar{;;et formant freque- ies showed some 
variability among the nine utterances (Table ,1). First i .mant frequencies 
were within a range of. '-^0 Hz. for both- sub j ect s , while second formant 
frequencies fell within a ' range of 2 30 Hz for Subject FSC, and 120 Hz for 
Subject GNS. The measured acoustic variability did not appear to correspond 
to any observed articulatory variability. For example, utterances 2 and 7 
for Subject FSC were characterized by s imil ar art icu latory target points, but 
quite different formant frequencies. Conversely, utterances 3 and 4^,__and:— 1— 
and 9, were characterized by virtually the same f oonaot — f-requehc ie's , but 
different articulatory target points. _ EjLther— ;th e^var iab i 1 i t y observed fell, 
for the most part, with in^..tlie--r3ng'e' o f measurement error, or more likely, a 
four-po int . _parametVr"izat ion tracking procedure of the type used in this 
--e-xpef iment is simply inadequate for the purpose of relating differences in 
articulatory target points to the acoustic output. It might also be noted 
that acoustic variability for both /u/ and /a/ were, in terms of percentage-, 
within the same range as variability for /i/. 

Carryover --ffects, then, when they do ippear, are unlike anticipatory 
effects in that they depend on the phonetic identity of the particular 
segment. Like anticipatory effects, however, carryover effects seem to 
spread no farther, than the neighboring, phone. Th ese . findings support an 
articulatory ba.scd formulation of speech production (MacNe ilage , 1970). For 
the most part, an articulatory target corresponded as a relatively invariant- 
represent at ion of a phoneme. Articulatory variability, when it did occur, 
did so only under special circumstances'. First, carryover effects for a 
consonant are reflected mainly in variability of jaw position., and only when 
the jaw i not primarily involved in the production of the piv. r p , as in /p/. 
However, when the jaw is more tightly involved in the produr t i of a phone 
(/t/ for example), degree of jaw Opening was not sensitive .o that of the 
■adjacent phone* The only other strong^ carryover effect appeared in tongue 
body movements. for intervocalic . /k/ . Here, unlike variability in jaw 
opening, carryover effects on tongue body movements do not seem- to be either 
random in appearance or inert ial in origin. Unlike /VpV/' and /VtV/ sequences., 
where the tongue body is usually in a waiting position be_forre. .it' moves toward 
the second vowel during consonant closure in a /VkV/ seq'uehce," the tongue 
body is involved as a pr imary ar t icul at or in- the production of the consonant. 
The movements of the tongue body through /k/ (Figure 10) .seem to be directed., 
in a St raight -1 ine fashion, to a common target position for release of .the 
consonant". The data for /k/ provide a fairly convincing illustration of the 
1 im i t ed . s pr ead i effects of coart iculat ion in a VCV sequence. Because of 
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continuous tongue body involvement in the production of. CVC syllables 
containing /k/ as the intervocalic consonant, the elements b'f these syll 
ables specially /k/ itself, should be. the most sensitive to the spreading ol 
coarticu-iation effects in both dire c fions . Yet, the assimilation of carryo 
ver effedts and the-' onset of anticipatory movements both occur within the 
closure period of the consonant, with movements from the same vowel into /W/ 
(ref. Figure 7), or movements toward the same vowel from /k/ (ref. Figure 
10), not being affected by the articulatory event on the other side of the 
consonant . 

Stability of tongue body targets for vowels (at least /i/ and /u/) was 
also the rule^ rather than the exception. The only substant ial art iculatory-. 
variability occurred in jaw displacement, with /a/ showing the- greatest 
effects and /u/ the least. As was mentioned before, however, variability in 
jciw displacement for /a/, .^i, measured ■ ant er ior ly at the inc i sors , might be 
either exagge ra t ed . or - ir re 1 evan t in relation t o var iab i 1 it y that might exist- 
in^ the-pharyngeal constriction for /a/. Likewise, the var iab il ity of maximum 
jaw displacement for both /i/ and /u/ seems unrelated to the variability 
observed in the position of the tongue body for those vowels. Thus, the two 
•features , tongue, body' he ight and jaw d isplacement , might be independent ones^,- 
■ with jaw opening being a facil'itory gesture and an unmarked phonetic feature. 
This formulation suggests a reevaluation of models of vowel articulation that 
specify jaw position as a pr imar y . d et e rminer of tongue height (Lmdblom and 
Sundberg, 1971). 

SUMMARY AND CONC r. USIQNS . - 

The major findings produced by this ::periment are as follows. First, 
"anticioatorv movemeiit s-. toward the second vowel in a vowel-stop consonant-, 
vowel sequence begin ^d'Ciring the closure period of the intervocalic consonant. 
This restricted coar t iculatory field includes both tongue ^body and jaw 
movements as soc ia,t ed wi th ■ the second vowel. Furthermore, the' size of this 
fi^ld is not affected by the identity ©f the intervocalic consonant. Second, 
like anticipatory effects, carryover effects did not extend beyond an 
immediately neighboring segment . Unlike anticipatory effects, however, the 
appearanc- of carryover coar t icul at ion effects depended on .the phonet ic 
idenrJLy of the particular segment on which these effects might act. 

Tne implication of these findings, is that- the rules governing the 
s\ctr.e:'tal input to a VCV string might not be. as complex as present models 
suggest. ITie finding that anticipatory movements begin and primary car ryover 
effects end at about 'the same time during the closure period of the 
consonant, suggests that the release of the consonant and movement toward the 
vowel are organized and produced as an integral articulatory event. 

This formuK^tion, which specifies a syllable-sized art icul a t ory. un it , is 
not consistent with the operation of a phoneme based scan-ahead mechanism. 
This does not n. .issarily mean, however, that a scan-ahead mechanism does not 
operate on lar^- r units or at another stage of the speech production process. 
•For example, Lindblom and Rapp (1973), Nooteboom and Cohen (1975), and 
rromkin (1971) have su^^cst^d the existence of an anticipatory mechanism in 
the temporal formulation of speech sequences.' Likewise, the complex reorder- 
ing of -ommands accompanying changes in. speaking rate (Gay, Ushijima, Hirose, 
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and Cooper, 1 9 74 ) • a 1 so sugge st s that the temp'^ral feature.-- 
segment might be known in advance. 

Thus, while it has traditionally been considered- that the serial 
ordering of segments is governed by complex rules whos3 effects can spread 
across several adjacent segments, and the temppral control of speech, is 
governed by a 'simple adjustment of timing of commands to the articulators 
(Lindblom, 1973), it may well be that ' the reverse is true: the segmental - 
input to the speech string is governed pr imari ly. by -simp le rule s that act 
upon syllable-sized units, while the temporal formulation of the string 

Vequires complex articulatory adjustments based on advance information 

obtained from a higher level scan-ahead mechanism. 

Like most studies of speech organization, especially those using high- 
speed c ine fluorographic techniques, the results of this experiment are based 
on data obtained f rom' a . re 1 a t ivily smal 1 subject population and are appl i- 
ble to the production of only a few phonet ic e lement s , themselves constrained 
by the artificial format in which they were placed. Thus, the findings of 
this experiment are obviously far from conclusive, and go only part way 
toward answering those- quest ions posed at the outset. The present findings 
can serve, however, as a basis for ^ examining or reexamining a number of 
question^^ concerning the organizat ion- o f segmental gestures. For example , it 
was sho'y^ that a four-point parameterization procedure-'.fof.':felat.ing articulk- 
tory targets to acoustic targets is inadequate. In' order- to resolve the 
di fferences ' between the acoustic data of Ohman ( 1966) and the art: iculatory 
data of the present study, formant tracking must be matched to a far more 
comprehensive multipoint parameterization of the vocal tract. The present 
.results also suggest, without prov id ing ■ conv inc ing • evidence , that the onset 
of anticipatory lip rounding might be conditioned differently in CCCV and VCV 
sequences; also, they raise further que st ions about the use of trade-offs 
between tongue and jaw movements in achieving articulatory (Targets, and the 
importance of jaw position in determining tongue, height in vowel articula- 
tion. 
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Measuring Laterality E^ffects in Dichotic Listening'^ 
Bfuno H. Repp 



ABSTRACT 

This paper discusses methodological issues and problems relat- 
ed to measuring 1 a t eral i t y e f f e c t s in dichotic listening. Section 
1 describes the standard dichotic two-response paradigm as" well as 
a ^number of indices of the ear advantage proposed in the litera- 
ture. The numerical range ofmost of these indices is constrained 
by performance level; only one particular index avoids these 
constraints. However, this does not necessarily make this index 
the optimal one. A correct^ion for guessing is proposed--an issue 
that has been neglected in the past. • Analogies to signal detection 
theory are discussed, as well a.v t,he theoretical, and empirical 
criteria for choosing the "correct" index of 1 a t era 1 i t y . The index 
called eg is proposed as the best solution given the present state 
of knowledge^ Section 2 discusses the phenomenon • o f dichotic 
iusion and the dichotic single-response paradigm, which offers many 
methodological advantages over the two-response paradigm. ^ Section 
3 discusses the factors of ear dominance and stimulus dominance in 
the perception of fused stimul-i. An index , of ear dominance is 
derived by taking - advantage of analogies to signal detection 
theory. In Section 4, a number of remaining problems are 
discussed: s t imul us . int e 1 1 ig ib i-1 i t y , guessing and sp.lective atten- 
tion, blend responses, test reliability, validity, and homogeneity. 

INTRODUCTION 



Since Kimura's ( 1 96 ]) demons t rat ion o f an average r igh t-ear advantage 
(REA) in the recognition of dichotic vpr bal s t imul i , many researchers have 
used dichotic listening tasks to measure hemispheric dominance for 1 anguage . 
Kimura's interpretation that hemispheric dominance for 1 anguage ' und er lie s the 
ear asymmetries has had almost universal acceptance. While some studies have 
been conte.pt with diagnosing the mere d irect ion of the f^erage ear advantage 
(left or right) and testing its significance., many recent studies have- 



*A slight ly'^ rev is,eo version of this 'paper is now in press in the Journal ' of 
the Acotist ical ■ S'oc iety o f Amer ica . 
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attempted (^to compare different individuals, different tests, or dltlore.nt 
experimental conditions wi/rTTTffspect to the observed magnitude of th^ ear 
asymmetry. Underlying these attempts has been the belief that . cer ebral 
lateralization, ■ like h andedne ss , i s ' a matter of degree and can be measured on 
a continuous scdle (Zangwi'll, 1960; Shankweiler and St udder t-Ke nnedy , 1975). 

In order to yield meaningful and reliable measurements, dichotic testing 
must meet certain formal and me thod o log i ca 1 • requ i remen t s that have been given 
relatively little attention in the past. If d i cho t i c . 1 i s t en i ng t asks are 
used as instruments, to measure the degree ^ f hemispheric dominance for 
language, -they must satisfy the s^me high standards of. construction, proce- 
dure, andj scoring as any o the r , ps ycho 1 og i c a 1 test. These sf.andards may be 
derived fr'onr methodo logical ly oriented research i'n the laboratory, theoreti- 
cal analyses of the task situation, and general test-theoretical principles. 
•Many of these requirements are not sufficiently met by dichotic tests as they 

are now used . . ' . ! 

' ■ . 1 

The present paper summarizes the issues that must be handled in 
constructing a -good dichotic test to measure hemispheric dominance. The 
dichotic listening situation is remarkably complex.- In the discussion that- 
follows, I provide some suggestions, but point out many problems that need 
further investigation or have not been dealt with at all in the past. 
Althou'gh the discussion is restricted to dichotic listening, many of the 
issues should apply to any situation in which 1 a t e r a 1 a symme t r i e s . ar e to'be 
measured (for example, t ach i s t oscopi c perception, binocular rivalry, or 
ocular dominance experiments) , and therefore may be of interest to a wider 
aiid i ence . 

The first focus of the present d iscuss ion . is choosing a numerical index 
of the ear advantage'. This probl^ is fundamental to the measurement of 
lateralization; unless it is solved, no meaningful comparisons between 
subjects, tests-, or' €ixperimental conditions are possible. In Section 1 
which heavily relies on earlier discussions by Halwes (1969) and Marshall, 
Caplan, and Holmes ( 197,5 )'— I d iscuss a number' of indices that have been 
proposed and used in" the past in conjunction with the cl.ichotic t wo-r,e spon se 

'■paradigm (that requires the listener to idenfi f y ^ both- s t imu 1 i in a dichotic 
pair). Most of these indices fail , to take into account the constraints 

■imposed by performance level on the range o f . d i f f e rence s between the scores 
for the two ears. In addition, none of them cor rec t s f or ' gue s s i ng , despite 
the- fact that most dichotic studies use only a few different stimuli, - 
resulting in substantial, guessing probabilities. After describing an index 
that takes both performance level and guessing into account, I hasten to 
point out that a correct index^must be based on a correct theory and 
empirical evidence of how scores for the. two ears change wij:_h _p.er foriiiance 
level and how guessing operates. Th is .theoretical and empirical basis is not 
available at present... I describe an index that is- based on plausible 
assumptions, but the question whether it is the "correct" index remains open. 

The second focus of the present paper is finding ways to simplify 
dichotic testing and to circumvent some of the problems cncounterrd in Iho. 
standard two-response paradigm. In Sections. 2 and 3, I d iscuss an approach 
to dichot ic 1 istening that in many ways socms simpler than the two-response 
,-^paradigm. This method,, that recji ires only a single response to , each dichotic 
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stimulus, relies on the phenomenon of dichotic. (or binaural) fusion. In 
Section 2, I discuss the f actors'that make two dichotic stimuli fuse more or 
less \completel y into a single perceived stimulus ^ as well as the methodologi- 
cal consequences of such fusion. Section 3 derives an index of the ear 
advantage for the s ing 1 e-respon s-tr par ad i-grrrr In Lht^'course of deriving the 
ind.ex,'. I discuss the phenomenon of stimulus dominance (perceptual dominance 
of one, stimulus over the other in a fused dichotic pair) - that exerts 
constraints on the ear score difference similar to that exerted by perfor- 
mance. Je^el in the two-re spons.e paradigm. l" i 1 1 u^ t r at e" how these constraints 
can be d'ealt with and how rhey actually become a crucial facfor in derivmg- 
an unbiased index of the ear advantage. ' ■ 

Section 4 is devoted to a survey of add i t iona 1 top ic s and problems m 
dichotic jesting: stimulus intelligibility, se 1 ect ive at tent ion , blend res- 
ponses, test reHab.iJLi^-— homogeneity , and validity. , Since my concern m - 
this paper\ Ts'^'eicTusiveiy. . methodological , I avoid any discussion of the- 
physiologickl factors that may underly dichotic ear advantages. My aim is to, 
develop met\hods for measuring the dichotic ear advantage with maximum 
precision. Before we can attempt to ans.wer the more fundamental questions 
"a'bou^ Che st'^uctures and proces^ses underlying the ear .asymiqetry, we mus-t be 
able to obtain valid and reliable measurements from dichotic tests'. There is 
much room for '.^improvement in existing methods with respect to that goal. 

1 . \ LATERALITY mPICES JN THE TWO-RESPONSE PARADIGM 
1.1. The Method ' ' 

In the t wo-^r esponse par ad igm , . t wo different stimuli are simultaneously 
presented to thb two ears, and the subject is asked to identify both 
typicall-y without any coTistraint on the order of report. The two responses 
must be different from each other, and. guessing i s ; encouraged . This is the 
standard situation, that will be considered in this sektion. ' 

I , ■ . " ' 

The results of a standard' two-response test may-be summarized in a 2 x 2 
table, as shown ii\ Table 1. The responses are scored as correct (that is, 
identical with one' of the stimuli) ''or incorrect, without regard to order. 
The proportions of^correct and incorrect responses are calculated separately 
for each ear, so that the row sums in Tablel are equalto 1.0. 



TABLE 



E 1 : The data s tiruct ur e in the . two- re sponse par ad igm . 



Responses ; 
Correct Incorrect 

\ ■ ■ ■ ; • ■ 

LE . Pl 1 - Pl . ' 1.00 



Chiinne 1 s 

/ . PE Pr ^ - Pr 1.00' 

fl ' . Pl + Pr ^ 2 - Pl - Pr : 2.iOO 

i 
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151 



The overall performance level is defined as the average proportion of 
correct responses per ear, 



(1) Po.- (Pr'+ . . . ^ 

-1 . 2 . • The Simple Pit t erence Score (d ) 

The simplest index of the ear advantage is the difference between the 
proportions of correct responses for the two ears, 



(2) 



d = P 



R 



The- vast majority of dichotic listening studies have reported the ear 
advantage as d, {Th''re is no commonly accepted name of the ir^ex; I call it 
d here simply for notational convenience.) The symbol d is '"xise ful. as a 
descriptive statistic, but it has severe limitations when the^ results of 
diff.Tent subjects, different tests, or different experimental conditions are 
to be compared. These limitations arise from the constraint imposed pn 
d i f f (;^rences between proportion::^ by their .absolute size — a fact that is often 
neglected and so con s t i"*'tu t e s one^^of the primary fallacies of descriptive 
statistics. In the context of measuring laterality effects, Halwes (1969) 
was the first to point out that, the ov,erall performance level P^ sets an 
upper limittod. 



(3) 



'max ^^o 
dmax - 2(1 



•Po) 



if 0.0. 
i f- 0.5 



1 Po 



0.5 
1 .0 



where d^^^^^ is the maximal value that d can assume at a given level of P^ , and 
, the corresponding minimal value. Figure la shows the triangu- 



^max 



^max 
-d 



lar function represented by Equation 3. ; • , 

Thus, d indices of different subjects, tests, or experimental conditions 
are not directly comparable unless the respective performance levels are; 
equal. 'Since, in general, per formance . level s are ' not constant from, one 
subject (test, condition) to another, comparisons of d indices, are almost 
certainly inval-id. Many studies in the past have neglected this quite 
elementary limitation of simple d i f ference • score s^ and , consequently, some of 
the'-r' studies may have reached 'faulty conclusions.'^ I should point out thau, 



Consider,' for example, two " sub j ect.s , A and *B . 



\;it:h .Po = 0.6 and 0.8,' 



respectively. Assume that d = 0.5 'for A and d = 0;^ for B. Who shows the 
larger ear-- advan t age? From a comparison of . indices, the answer would be 
A. However,' B never could have reached that index because her higher 
performance level that permits a. maximal d of only 0.4, Th^ r'e is no reas-on 
why B's better performance on the test should imply th.it she is less 
lateralized than A. In fact, once perform^Tnce level is taken into account, 
it becomes, clear that B shows the maximal d for her level of performance, 



while" A' s index is con.s iderabl y below the innx-imal d possibW^ at P^ 



0.6 



(^max ■= O-^)- 



It therefore should be' concluded that; contr; 
superficial impression, B shows a stronger REA than A/ 



to the first 
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Figure J^: The numerical rcinges of four indices of the ear advantage 

fiincLion of porformnncG level- • ^ ^ ' ' • 



/ 



theoretically, it does remain conce i vab le that direct comparisons of d are 
valid, after all--'that is-, that d is the correct index to use--but the 
assumptions that would have to be made in order to ju5^ i fy such comparisons 
are highly implausible (see Section 1.5). Hopefully, emp ir ical • ev i denc^ will 
become available in the tuture to decide this issue objectively. 

1,3. " Corre ct: ing" for the Constraints u f Per f ormanc'e Leve 1 

Several authors became aware-, of the limitations f d and proposed 
alternative indices of the ear advant.age (that were subsequently used by 
others). All of these ind ice s were intended;,to provide a measure t;f the ear 
advantage tha*" is independent of performance level, both theoretically and 
empirically. Only the last of the four (indices ■ •lat I will discuss ,seems to 
achieve this aim, but, as' I will argue, 'it is s t i 1 1 • f ar • f rom a filial "stilut ion 
to the problem of finding the opt iihal index of the ear advantage. ' 

POC (percentage of correct [ responses] ) SV'-' ?0E (. percentage , of errors) 
are two alternative indices suggested by Harnhman and Kr ashen '( 1 972 ) . '"i^hey 
are definec^ as ' 

(4) POC ="T^7tPR + Pl) , ' ^ 

(5) POE = (1 - Pl)/(2 ~ Pr - Pl) - . 

These indices range from 0 (perfect LEA) to 1 (perfect a'^EA) ; an index. cr 0.5 
mean? ao ear advantage. For those who, like myself, prerer a scale roP.ging 
from (perfect LEA) to +1 (perfect '^'^A) — and this is entirely a matter of. 
personal cho ice-- corre spond ing POC' aq^d POE*- incices are obtained by a simple 
linear transformation of POC c3nd POE: 

(6) POC = 2P0C - 1 = JP.R - Pl^/^Pr ^L^ ' 

(7) POE' = 2P0E - 1 = (Pr^- Pl^/(2 - Pr - P^^) . 

An anaiagous index was proposed by Studder t-Kennedy • and Shankweiler (1 970) . 
but their index was based on single-correct responses only. For a brief 
discussion, see Repp (1977b). 

The limitations of POC and POE as a function -of performance level have 
recently been competently discussed by. Marshall et al. (1975), Th^ analo- 
gous limitations of POC and POE' are illustrated in Figures lb and Ic . In 
formal terms, we obtain from Equations 3, .6, and 7, 

POC'n,ax =1 if 0.0 1 1 0.5 

POC'^aj, = (1 - Po)/Po if 0.5 < Po 1 1.0 , 



^'^^ = P^/d - -if 0.0 1 Pol£ 0.5 

POE'^a^ = 1 if 0.5 < P-o < .J .Q 



Thus, it is^> evident that the range of POC' is unconstrained at low 
performance levels and the range of; POE' is unconstrained at high perfoiTiance 
levC'ls, but where one index is unconstrained the other is severely limited by 
performance level. The same is true for POC and ?0E . . Harshman and Krashen 
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( 1972) preferred POE over ?0C after empirically demonstrat injj^ high positive 
rnrr-f^]atAnn hptuppn P_ .nnn P^i" but a ' 1 ow Correlation between \P^ and POE, as 



low correlation between 
the 1 i terature.. Th iv^f 
.-it high performance levels, are 



o J but a 
jdies in 



correlation between ana. 
c omput e.d \ over a number c 
explained; by the fact _ . 

encountered in dichotic studies than low performance levels ^ \so that the 



and POE, as 
f^nd ing 'can be 
^ore commonly 



majority of the reported scores fell 
POC^iax is 1 independent' o.f i- formance 1 



in the 
evel . 



region where 



rather tiian 



A quite different and ' original approach was taketi by K\jhn C 1973) 

who proposed an existing :al index, the (}) coefficient, as\the solu- 

^tion to- tlhe performance problem. However, Levy :(in pr\^ss) has 

presented 'mathematical^ pro .1 'Empirical evidence that the ^ coefficient 

does depend on performance level. ■TT.e theoretical" argument' can be \made in 

. itr--': ^ ^ied |f orm by pointing out the relationship between (J) and PQC ' add POE"': 
1 



^; = - Pl)/[(Pr + Pl^(2 - Pr ~ ^L^J^^^ = [ ( POC'' ) ( POE ' ) ] ^ /\2 



i.hen, trom Equations^ 8, 9, and 10, 



(11) 



Thus, cf) 
except 0.5. 



max 
max 

much 
This 



- [Po/(i - Pon;/2 



if 0.0 <_ 
if 0.5 < 



0:5 
1.0 



like d 



max J 



is constrained by Pq at all performance levels 



is illustrated in Figure Id 



\ 



. Being a conjunction of POC ' and POE'--viz., their geometric meany- 
4> combines the constraints of these two indices.* The most obvious .solution 
is a disjunc,tive use of P00< and POE' that takes ' advantage of t'he fact that 
each is unconstrained in one. half of the range of Pq . Thus,^ \ 



(12) 



POC = (Pr - Pl)/(Pr I Pl) 

POE' = (Pr - Pl)/(: -'^^Pr - Pl) 



if 0.0 1 
if C.5 < 



■ o 



0.5 
1.0 



\ 



Since e - d.^/d^^^ 
of P 



3) : ^max = '1 
the d'bserved 



thus is completely! 
ear difference as a' 



(cf . Equations 2, and 
independent! of P^ . The ide^ to expres's 
proportion of the maximally possible\ear difference at a given performance 
level was first conceived by Halves (1969) and, more -recent ly and apparently 
independently, by Marshall et al . ( 1975 ) 'who called their index f- The 
solution seems s t ra ight forwar d"--i t. is 3 simple multiplicative rescaling of d 
to ,fiit itsrestrictedranpe, 
1 

i Nevertheless, e is/, not necessarily the optimal index. The kind of 
theoretical and empirical- su^^port that is needed to determine the corr'^ct 
index will be discussed in Section 1.5 (see also. Marshall et al., 1975). At 
this point, T would like to consider a more obvious shortcomi^^ of the c 
indexi (arid all other indices proposed, for that matter): its failure to 
correct' for guessing. Strangely enough, a ■/ rrection for guessing has never 
been considered in the past, although it is obvious that guessing plays a 
substantial role in most di photic e- riments-.' In the next section, I will 
propose a correction for this factor. 
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1.4. Correct ion 1 ^r Guessir .,^ 

In order to deal with :he ggessing problem, we need to consider the 
scores for each ear, not ju.^r their difference d , as a function of Pq . This 
is illustrated in Ki^'ure ?. : . The diagonal line labeled = P^ is the case 
of no ear advantage (d O ; . In this case, Pr = P^ = Pq ' regardless of rhe 
guessing probability. At the ether extreme, consider the maximal, and mini'hi.d 
possible ear scores,, P«niax ^"^ ^Lmin* a function of Pq , (We assume here, 
without loss of generality, r.hat the rig" ear is the dominant ear; the 
corre-sponding results for left-ear advant age s are obtained by interchanging 
the R and L subscripts ) Let us first assume that M, the number of stimuli, 
equals infinity, so t:h..L the gue s s i ng probab i 1 i t y is zero. Then t'.e lowest 
possible performance l4\el, Pomin> is zero, and, of course, PRmax ^ PLmin ^ ^ 
i ^, Pomin = 0- ^^^.f^ increases, PRmax increases linearly towards 1.0 while 
^Linin remains at 0; consequently, P^niax = ^Pq and FL^in = ^ for 0.0 1 Pq 1 
0.3. At Pq = ^Kmnx reaches 1.0 and remains at this level wli 1 1 e ^P^^ii „ 

begins to increase • *"h Pq ; cons-rquenLly , PR^ax ^ ^ ^Lmin ^ ^^c^ " 
0.5 <_ Pq 1 1.0. -IS, the maximally divergent scores for the two ears are 
represented -by th<.' large parallelogram labeled N = i n Figure 2a, Of 
course, PR^ax " PLmin ^'max » whose relation to Pq is shown iq Figure Ki and 
again in Figure 2b as the functic Labeled N = . 

Now consider the more realistic case of a nonzero guessing probability. 
Two typical cases, N = 6 and N = 4, are illustrated in Figure 2a. The lowest, 
expected performance level for a given number of stimuli, PQ^ini found to 

be 

(1?) PQ^i,^ - (N - I)/(N) = 27N . 

This is the performance level that would be expected if the subject produced 
only completely r mdom guesses., because (N - 1) of th» possible (^) = ^("N 
- 13/2 combinations of two responses lead by chance to a correct response for 
one ear. Thus, ^r^^,^ - P^^,^ - PQ^i^ i f Pq = Pomin'. From,.this minimtimf 
PRmax increases linearly towards 1.0 as Pq inc Jl. '?ses / while PLmin remains at 
chance level. However, this chance level d.^es not remain constant but 
depends on Pr^^^tn- . At the point of maximal ear difference, PR^ax reaches 1.0, 

'"^^ PLmin = 1/(N - 1) , 

which \s the simple guessing probability for N stimuli. (It is not I/N 
because the right-ear response must be different from the left-ear re^iponse.) 
In other words, at this point a hypothetical listener with - th.c maximal 
possible ear difference always can identify the stimulus in the dorninant ear, 
but produces a random gL ss for the stimulus in 'ine other 'ear. The maximal 
ear difference dj^^v at this point is 

^15) d^^^ = Pr^3^ - PL^i„ = 1 - 1/(N - 1) = rN - 2)/(N - 1) , 

which is the maxirial expected ear difference for a given N. It occurs at a 



performance level of 
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= 1/2 + 1/2(N - 1) = N/2(N - 1) 



From this point on. PRmax remains at 1.0 and PLmin incre^jes with P^ , 
compl ete f unc t ions relating PRmax ^Lmin ^^o 



Til.* 



(17) 



(1^) 



PRmix ^ 12/(N ~ 2)][(N - ])P^ - ] j i f 2/N 1 Po 1 ^/^i^ " ^ 

i f N/-2(N -1)1 i'o 1 ^ - ^ 



^Rmax *" ^ 



PLmin - [2/<N - 2)]( 



Lmin 



- 2P, 



1 



if 2/N <^ Po 1 N/2CN 
if N/2(N - 1) < P 



o 



■ 1) 

I .0 



Figure 2b shows , the corresponding r e 1 a t i ; '.i i p oetween d,„gj^ and 
= « , N = 6 , and N =4. For a finite N, this . ion is 



(19) 



= 12/(N - 2)](NPo - 2) if 2/N 
= 2( 1 - P„) if N/2 (N - 1) < 



Pq <_ N/2(N - 1) 
Po < 



(The function for N =. o° is given in Equation 3.) 

Now we define e-, — as we will call e with' t;he correction f or. gue ss ing--a s 



(20) 



- ^/^max " 

= (Pr - Pl)/12(NPo - 2>/(N 
= (Pr - Pj)/[2(l - ?o)]' 



2)] 



if 2/N < 
if N/2(N 



o _ 
1) < 



< N/2(N - 1 ) 



1.0 



Equation 20 shows that eg is identical to e — and thus to POE' — in the upper 
range of performance levels. In other words, POE' is unaffected hy guessing 
probability and needs no correction- It is only in the lower range of 
performance, where POC ' applies, that a correction for guessing becomes 
necessary. Without ^t , thef^ magnitudes of ear advantages at low performance 
levels would be '^eriously underestimated.. 

The correction for guessing that I jus^ propos.-d is only a global and 

approximate solution. Ideally, such a c action should be based on a detailed; 
model of perceptual and response . processe.s in dichotic listening. At present, 
such a model does not exist. Recently, T have considered a very simple 
probabilistic model that, assumes that the listener either perceives '<q stimulus 
correctly or makes a random guess, independently for each ear. I found that 
the e index based on the resulting estimates of the "true" probal . 1 i t ie s of 
perceiving left- and right-ear stimuli is almost identical to Cg , Hovevr^, 
the model is too simple to provide a complete account of rhe perception . f 
di-:hotic stimuli. A more detailed discussion of this approach' is provided in -i 
separate paper (Repp, 1977b; , 

1 .5 . Isolatera 1 it y Contours 

The "correct" index of the ear advantage must fit ' both theoretic^^l 
conceptions anr^. empirical evidence. Halwes ( 1969) believed he had solved the 
"theoretical problem by prooosing an index (e) whose range' i'^ free of the 
con raints of performance • evel ... However , this argument, intuitively appeal- 
ing as it is, really attacked the problem from the wrong side, although it may 
have led to a correct oujcome. Marshall et al . ( 1975). who also proposed e as 
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che perhaps best index, correctly sti' ssed that different indices n:'prc?sent 
"psycho lugi cal theories of how an S [ ii ^ject] changes Rc and Lc [ Pji and 1 in 
arhieving different overall nccuracios" ,^ . 320 ). In other words, pe r lorinan.-o 
level must be understood av, a consequen ce' of changes'in right-ear ano l^ft-. .^r 
scoreSf and the concomitant: constraints on the ranges of certain indic.i..:-, must, 
be accepted if they are p edicted by tlieories about the form of covariation of 
Pr and Pl- There is no such theory that postulates that the range of ad ear 
advantage index must not be constrained in any region of performance. c 

However, among the infinite number of possible theories, there is one. 
class of theories that leads precisely i:o this out come --an example is the 
theory underlying the e index. In order to clarify this point, consider the 
isolaterality con t o 'ts assumed by different indices, that is, by different 
theories of the ear i vantage. Isolaterality contours connect points of equal 
underlying ivi r as^innietry at different levels of performance. In Figure 1, 
these contours .vould be parallel horizontal lines within the limits of each 

index. It :s more ill li. rating o repres&at these isolaterality contours in 

terms of Pc^ and Pj^ , as .Marshall et al. ( 1 975) have done. Figure , 3 plots Pp 
against P^, ? :hat: the isolaterality contours connect all pairs of scores (Pp, 
Pl) that are assumed to reflect the same underlying Cir asymmetry. To simplify 
the exposition, we have assumed in Figure 3 that the guessing probability is 
zero; a nonzero guessing probability would have the effect of restricting the 
possible score combinations to a region in the upper right-hand corner of the 
unit square ^or accuracy space, as Marshall et r:l. call it). 

Figure 3 shows the isolaterality contours assumed by four theories: those 
associated with the indices d, POC * , POE and e. No te. that the region above' 
the positive diagonal represents REAs , while the symmetric region below the 
po s i t ive d i agonal represents LEAs . -The i so 1 vi t e ra 1 i t y contours are shown only 
for REAs; those foi. LEAs are obtained oy symmetric reflection around the 
positive diagonal. The i soper f ormance conto urs , which- connect all pairs of 
scores ( Pp , Pl) at the same Pq = (Pj^ + F^^) /2 , are straight lines parallel to 
t he i;ega t ive d ingona 1 in each case . 

*- 

Figure 3 shows that only the e index provides a definite est iwe 
magnitude of the ear advantage for every pair of scores. Ti.e ol . . ,:*uee 
indices depicted can r, i ve only a lower or upper bound on the ear advan' ., \'^e . 
one of the two ear scores, is either at chance level or perfect bo ""j. ...-.se 
data points cannot be n-^. iquel^- assigned to a particular isolateialii, v ..or«':oui. 
For example, the fact that d- cannot exceed 0.2 when i'^ =-.0.8- cue to tho 
"constraint imposed by performance level on the range of the index," discussed 
i n . connec t ion wifh Figure l--reariy implies that, if d is the correct index 
(that is, if ^ae theory underlying d is correct), any tri'.e e.-ir advantage of d 
G.2 cannot ' ^- measured rr p^^ = 0.8. if the model underlyir^.^ d happened to 
correct, this disadvantage must be accepte.: ; i" .annot be t:.?] on as an a prio : 
argume-' again.st the index-theory: Similar irgumentr: nppiy to POC/ and POl::'. 

From Figure 3, the cl o^e a^vi a 1 ogy to signal letoction :xpe imon c s that hat 
also bren pointed out bv Marshall et: al. ( 1975) is evident. t-j^ is formally 
analogous ^ the false alarm probability, and P^^ to trio hit probabil ^y in 
si::nal detection. Isolaterality contours corre.'^pond to receiver operating 
ciw icteristic (ROC) functions, and i sope r f orman ce contours 'o isobias contours 
(cf. Green and Swets, 1966). The isolaterality cr-,ntours .%'5>sumod by the o 

^ ' rj9 




Figure 3: Isol ateraUty contours assumed by. the .theories underlying four 
indices of the ear advantage. 
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ind>?x C Figur o 3d) are linear approximations to the HOC functions resulting from 
tjre standard signal detection model nssuining underlying normal distributions 

y^ith equal variance. This j>ives the e index sdme intuitive plausibility, in 
view of Lhe success ol the stanlard signal detection model in many different 
situations. However, whether it is also a correct model of dichotic listening 
remains to be proven. In the absence of stronger theoretical or empirical 
support, the alternative mddtls underlying d, POC ' , or POE ' cannot be ruled, 

"out. The POE' rrtbde 1 , " f or example, corresponds to a "high-threshold" model in 
r^-^mis of signal detection theory that has been found useful in certain 
cuations (Green and-Swets-, 1966). There is an infinity of other possible 
models; those depicted in Figure ? are merely the extreme cases. 

In addition to the intuitive appeal of e, its under lying assumptions may 
be plausibly conceptualized as follows. Assume that differences in performance 
level reflect different levels of r.oise in the perceptual~aud itory system of 
I-steners. Further,'' assume that, as the internal noise level is reduced from 
very high to very low, Pj^ and P^ increase independently of each other in the. 
form of two ogive functi'^ns. The separation between these functions equals the, 
true ear asymmetry and may be expres;sed in terms "of the signal-detection! 
statistic d'. This simple conception is identical with the standard signal 
detection nirol, so that e--whose i so 1 at era 1 i t y contours are a good approxima- 
tion to the standard model--would be the correct index (if not d' itself is 
chosen as the index, wh ^! ''i certainly is an option). Again, however, this 
argu:nent has. only intuitive pi aus ib i 1 i t y a t present, A decision between 
different models will reo'ire empirical evidence in favor of one or the other. 

Un fo: Lunately . ..empirical t^^sts of the models are difficult. Marshall et 
al. (]^^5) have point.-d out th'^^t, in an ,y to signal det>ection, it would be 
necess.-^ry to vary performance level in a number of stepsjw, le holding the 
underlying ear asymmetiry constant. This would generate poihts on the same ROC 
function whose shape could then be determined. 'There are poth theoretical^ and 
practical problems with th r. approach. The most obvious (Whnique would be to 
mploy masking noise or soi^e ccher form of distortion to vary performance level 
within a single subject, but it is not clear whether this iation would b^- 

equivalen' to the hypothetical variations in interna'! noise level -^hat caust: 
variations in ?q between subjects, d.s-.pire high monaural intelligibility of the 
stimuli. Cullen, ThoTr.p'son , ^lagn-.s, Berlin, and Samson ( 1974) have varied 
r>ienal-t o-noise ratio . .in a dichotic two-response paradigm, but their results 
arr- irregular and permit conci.ision. A practical problem is that ear 

advcintages tend to be rather small and highly variable, so that an enormous- 
amount data woi i be necessary to d is t i n:;i;u i sh different shines of ROC 
functions in the vicirity of the p.;,^itive diagonal 

^ialw^^s (lb'b9) used the r-;r;re global empir'cal approach of taking the 
av-:- r ajj o.ar differences obt:\^i'ned for different groups of subjects in a number 
of >!if[er.?nt experiments and plotting them as a function of the "natural" 
■..iriat^ons in average pei formance level between the experiments. Whe:; *he 
cvt. rag : '-^ar advantage's were expre-/;ed in terms of- e, they turned out : '^e 
str-Jkingiy iilder.^ndent of performance level, which, at the time-, pr v 'd 
impressive empirical ~ ^oi the e index. Unfortunately, this result s 

not holu up in th^- • g! more recen^" data- J have surveyed'a large r. ;.)er 

of dichoc ic-- st jd ies condu/ted since 1^^ and iound large variations in the 
magnitudes of ear sdvanto-es from sr..'::, :o strdy, — gardless of performance 
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level, so that no clear conclusion emerges from tiie data- 



Yet another way of testing the .Tssumptions underlying the e index would be 
to condu:t an analysis of individual stimulus pairs. A large amount of data 
would ha\e to be. collected for this purpose. If individual stimulus pairs vary 
in their ''performance h.vel/' then the data points plot ted ' against P^) for 

all individual stimulus pairs should lie on the sa.ne ROC function. This 
approach is analogous to that discussed in Section 3 for the single-response 
paradigm, and 'it is certainly worth investigating. However, it is not clear 
whether individual stimulus p^irs.vary more than randomly in "performance 
level" (except for the "feature-sharing effect" discussed in Section 4); 
performance level -has so far been considered a characteristic of the li.stener, 
not of the stimuli. More detailed investigations of the dichotic competition 
between individual stimuli are needed. . 

Thus, although e has the advantage of being the most intuitively satisfy- 
ing index, other indices and their corresponding models cannot be ruled out 
completely at present. I would recommend, however, that eg (that is, e with 
the correction for guessing-'-Equat ion 20) be adopted as an index as long as 
-there is no ^Vrdernre^^TFTat speaki. against its use. In the ' rema inder of this 
paper, we willl describe a simpler approach to measdr ing "the ear advantage that, 
despite many analogies, avoids some of the problems inhereii.t in the two- 
response paradigm. Some of these problems will become clear'as the discussion 
proceeds (see (Especially Section 4). Considering the complexity of the. two- 
response paradigm, it may be time to look for alternative methods that perhaps 
flc h 1 e V e""rhir''^s am e goal with fewei: 'compl icat ions , 

r 

\ :2 . DTCHOTIC FUSION AND THE SINGLE-RESPONSE PARADIGM 

) ' ■ , ■ ■- 

< ■ _ 

2.1. Dichot ic Fus ion 

When two sounds are presented simul jneously to thi^. two ears, they are not 
always perceived as two dcparate events. Often they fuse into a single sound 
image. This is obviously true when the two sounds are exactly identical- In 
real life, e nv ironmen r n 1 sounds normally reach both ears, but the signals at 
each ear typically shov, slight differences in specfum, intensity, and time of 
onset- Nevertheless, they give rise to a single localized sound image (Mills, 
1972). 

Stereo headphones make it possxMe to present dif fer'ent sounds indepen- 
dently to the two ears and thu:^ to investigate the mechanisms of binaural 
(dichotic) fusion- Laboratory studies have shown that the fusion mechanism 
tolerates a cert.ain amount of spectral discrepancy beyonc' that encountered in 
natura.l situations.. Fpr example , dichotic sinusoids within a certain ctitical 
frequency range (the "binaural cri.tical band") are heard as a single tone,, 
although it may "beat" .when low frequencies are involved (Odenthal, 1963; 
Perrott and Barry, 1969; Van den Brink, Sintn icol aa s , and ...in St am,- 1976)- The 
width of the binaural critical band increases with signal frequency (Perrott 
and Barry, l969) and intensity (Perrott, 1970); it also increases as the signal, 
duration decreases (Perrott, Briggs, and Perrott, 1970)- The fused tone is 
heard at a frequency in t ermed late bet we en the two dichotic frequencies (Oden- 
thai, 1963 ),. Of special importance is the finding that. twD differen' tones 
that normally woull not fuse can be mac^.c to fuse by imposing the same low- 
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frequency modulation onto them (Leakey, Sayers , and Cherry, 195P; Tobias, 
1972). In general, it seems that complex auditory signals with similar 
waveform envelopes fuse, despite considerable differences in microstructure . 

lliis result is important ,in. the dichotic fusion of speech sounds. The 
waveform envelope of a speech signal is determined by its low-frequency 
components (primarily the fundamevital fr(?quency) , while the higher formants 
constitute the microstructure. Two different formants presented dichotically 
at the same fundamental frequency fuse into a single sound, whil^ two formants 
with the same center frequency but with different fundamental frequencies ai'c 
heard as separate sounds (Broadbent and Ladefoged , 1957) ;' .Thus, a speech 
signal "^may be "split'' by filtering it into nonover 1 apping\ low- a i high- 
•.frequency bands which if presented simultaneously to the two ears, c. e heard 
as a single source resembling the original (Broadbent, 1955; Franklin, 1969). 
Several recent studies have employed the related "spl it-formant technique" with 
synthetic speech,' where some ' formants are presented to one ear. and the 
remaining, formants to the other ear (Rand, 1974; Nye, Nearey, and Rand, 1974; 
Nearey and Levitt, 1974; Haggard, 1975). Cutting (1976), in his recent 
classification of dichotic fusion phenomena, called this "spectral fusion 

Dichotic fusion is not limited to thj case where parts of., a speech signal 
fuse to re' '^^-^itute the original whole stimulus. Even if two different 
complete utr : ..nces are presented, the perceptual result may. be a ingle fused 
stimulus, pi -vided that the two dichotic stimuli have sufficiently similar 
fundamental frequencies. The fused p^-'.cpt r y resemble one or the other 
component, or it may be a hybrid (see Cutting. 1976). In assessing dichotic 
ear diffeiences, it is important to know whether some or .^11 of the stimuli 
fuse. Ideally, the experimenter should be abie to (Control this prope.ty'i-f the 
s t imu 1 1 .. . • 

The 'verbal materials u-'^ed in dichot'c listening studies may be roughly 
classified into three groups: 

(U Words, digits, and other 1 arger-s ized verba 1 units. Ty p:'.c a 1 1 y ,^ they 
are natural speech and acoustically heterogeneous, so that the waveforms in the 
two ears show little correspondence. Therefore, they tend not to fuse.^ 

{2) Natural-speech nonsense syllables that have been us^jd extensively in 
recent researrb (for example^, Studdert-Kennedy and Sh ankwe i i. er , 1970; Berlin, 
Lowe-Bell, Cu: I en Thompson , and. Loovis, 1973; CuUen, Thompson, Hughes, 
Berlin, and Samson, 1974). The typical set is Jba/ , /da/, / ' : /pa/ , /tii/ , 
/ka/, spoken by the same' voice. Some of X\\<i dichotic p-iirs fo.^' -g from these 
:syllables may fuse into a single syllable if they are spectrally similar and 
properly synchronized; this will depend on the particular stimuli and r^corc^in?^ 
procedures used. Apart from temporal r?Ugnment, however, the . expe r i - en t er has* 
little control over fusion. Tests, of this kind ofben contain fused and unfused 



^Never theless , the spectral separation of the two competing signals may affect 
performance. Perceptual separability may viewed as a continuum ranging 

from perfect fusion to perfect separability. (See also the discussion of 
selective attention' in Section 4.4) 
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pairs mixed togot.i)pr, whirh is ^ me t hodo I o^^, i c a 1 d i sad van t dge . 



(3) Synthetic svllables ^ for example, Halwes, 1969; Shankweiler and 
St uddert-Kennedy , 196"' 1 975 ). As with any other stimuli, it depends on their 
spectral similarity Uiiosf of all on their fundamental frequencies) whether they 
do or do not fuse. However, the important advantage of synthetic syllables is 
that their acoustic propert ies--aiul , hence, their tendency to fiise--are under 
the control of the experimenter. Thus it is possible to construct h^mogeneouc 
tests that contain only pairs that fuse, or only pairs' that do not fuse. 

The most widely used synthetic stimulus set is /ba/, /da/, /ga/, /pa/, 
/ta/ , /ka/ with identical fundamental frequency contours. As with the analo- 
gous na t ural -s peech set of syllables, the reason for their popularity is 
primarily the convenience and availability of cr stimulus -set that tends to give 
reliable REAs — not their tendency to fuse, tha*" has . been given little atten- 
tion. The differences between these stimuli are confined to the first 50 msec 
or so, -which carry the consonantjl distinctions. The vowel por^t i ons--t ha t may 
last for another 2 50 msec or so — are exactly ident i cal . and therefore fuse 
perfectly in dichotic presentation. This alone is sufficient to guarantee that 
diclioric pairs ot these stimuli will sound more or less fused (Halwes, 1969). 
The "morv^ or less" will depend on the spectral similarity of the initial 5^ 
'msec. Synthetic /ba/ , /da/, /ga/, if synthesized so they differ only in ^he 
transitions of the second (and third) formant fuse perfectly into a single 
syllable. This was experimentally demonstrated by requiring subjects -tN^ 
discriminate dichotic pairs- from binaural (identical) pairs of stimuli from the 
samr set. Most, of the subjects, including experienced listeners, performed at 
chance level (Repp, 1976b). It is justified7 therefore, to call these stimulus 
pairs "perfectly fused". 

.nformal observations suggest that strong fusion is also obtained for the 
voiceless set (/pa/, /ta/, /ka/) if, the stimuli differ only in their formant 
transitions. On the other hand, stimuli *-hat contrast in voicing (and thus in 
the relevant cue, voice onset time, so that a periodic waveform in one ear is 
accompanied by filtered noise in the other ear during the first 5^^ i;isec or so) 
are sufficiently different to prevent perfect fusion. The listener has some 
indication that different events have occurred in the two ears, but since these 
events are immediately followed by a perfectly fused vovol, their discrepancy 
is perceived only as a brief noise or roughness ^accompanying the perception of 
a single fused syllable that can be identified _^wijthou: great -difficulty. 
DiciiOtic pairs consisting of a single phonetic percept accompanied by an 
auditory si-gnnl of interaural discrepancy may be called "partially fused". 

Til e fusion of synthetic syllables can beeffectively prevented by present- 
ing them at different fundami'ntal frequencies (Halwes, 1969; Repp, 1 976a). 
'Temp*.ral as yiv.'u ro^'v also v^.'.^^^es fusion, but as long a.s the signals overlap, 
they may stil: pcirLi-^lly fuse. Some researchers have paired CV syllables that 
contrasted in rheir vowels as well as in the initial consrnants (Studdert- 



^The actual syllables in this exper 
nature of the vowel is immaterial, 
in a^ number of othe-r studies since ( 
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iment were /bae/ , /dae/, ./gae/, but "the 
r'^ rfecrly fuf^^'d syllables have been used 
P-jjp, 1976c, and unpublished work). 



Kennedy, Sh mikwe i I er , :ind Pisoni, 972). Different, vowels with the same 
fundamental frequency seem to fuse quite well, ^Jlthoyfih they may be discrimin- 
able from bin.^urnl stimuli it they are ' spectrally dissimilar (Kuwahara and 
Sakai, 1^)76)/'* The frequency of th(^ lirst formnnt may play role in addition 
to fundainentcYl -frequency, but little work has been done on the. fusion of 
complex sounds such as vowels. The influence of various o.ther parameters, such 
differences in initial bursts, transition durat ion , etc ^ , on dichotic fusion 
speech sounds has not Keen systematically studied. If stimuli involving 
such" differences are to ,be used for assessing ear advantages, their degree of 
fusion should first be determined. ' * ' 

2,2. ^be S.ingle-><esponse Par ad igm 

The standard procedur^p requi^V the subjeccs" in a dichotic test to 
identify both competing stimuli. While appropriate with unfused stimuli, the ^ 

"two-response procedure has also been used with synthetic syllables subject to 
dichotic fusion ( for -example , Shankweiler and S t udder t~Ke nnedy , . 1-97 5 ) . It i 
not surprising that the overall accuracy was quite low in these studies, 
becaui^e at least or of the two responses itru.st have been a guess. Although ,il 
is possible' to analyze only first re s ponse s ^ and ignore second responses, one 
cannot be sure that tb- sub j ec t s always record their most confident response 
first, even /When instructed to do so. Thus, the ^responses reflecting what the 
listeners actually perceived are distributed over two response columns, and it 
is impossible for the experimenter to identify them reliably. Hence, instruc- 
tions to identify two stimuli when only one is heard are inappropriate. .'The. 
only appropriatp instruction is simply to identify the syllable heard (the 
s ingle-response paradigm). The' listener need not even be informed aoout the 
presence of different events in the two ears. Instructions to selectively •. 
att-nd to one ear. are also in-appropriate when _the_st imul i. are fused , since i-t: / 
nas been shown that .sel.ect*ive attention to one ear has little or no e f feet with 

.fused stimuli ( Hn 1 we s , ' 1 969 ; Repp, 1976b). The topic of selective attention 
wi 1 i be discussed in more detail in Section 4.4. 

Thus, dichotic tests using fused syllables are quite different from those 
using unfused stimuli. With unfused stimuli, the subject gives two responses 
that are then c 1 ass .i f i ed^ as correct or i n':' o rre,c t . ^ The emphasis is on accuracy 
of identification. A I arg^- number of errors is desirable. These errors should / 
be due to dichotic c oinpo t i t i on only; the monaurrl intelligibility of the / 
stimulj. should be as high as possible. The "raw" ear advantage (d) is defined 
as the difference between the proportions of correct v^spoTises for Jhe two ; / 
ears . ^ ' - ^ -J 

In a test using f used s t imu 1 i , on the other hand, only a single response 
is given each stimulus pair. Id ea 1 1 y t-h i - . . sponse should match one or the 
ether -c^f the component stimuli. Dichotic p'a i i n I wh ich this indeed tends L..- 
be the case (for e-ample , 'vTZ-Zda/ , whicl^ is' heard as i ther /ba/ or /da/)^ ar- , * 
espeMallv desirable. Oi.,/ pairs also yield hybrid responses such as "psycho- 
acou.-.Li': iusions" or blend r-sponses (Cutting, 1976; Repp, 1976b, i?77a'. The 



/ 



^Kuwahara, H., and Saka i , H. Identification and dichotic ii - ol tim 
varying synthetic vowels. Unpublished manuscript, 1976. 
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methodological probloms created by such responses will be discussed in Sod ion 
If we . cons idor . on 1 y the "ideal" pairs, such as /ba/-/da/," wherc^ virtually 
all responses match one of the two component stimuli. W(? see raaL th.t'.'ro are* no 
errors arid accuracy is perfect (or,^in practice, as good as ilir inoniurj<l 
intelligibility of the stimuli). The quest; on is not how ao^urately vnch c.ir 
per formed but how the competinp; information was weighted and combined into-a 
single perceptual outcome. Thus, the emphasis is on dichotic integration, not: 
on competition. Instead of different accuracy levels for each r-ar . we have two 
complementary prop(>r t ions representing each ear's share of tl re ppon s . The 

difference between these proportions represents the "raw" ear acvantngf. 

• - ■ V 

Despite the theoretical and methodological differences, the tv^c> 'parad i gins 
also have much in common. Specifically, the problems encountered in deriving 
an appropriate laterality index are rather similar. This will become evident 
in *"he following section which derives such an index for the single-response 
paradigm. 

I 

3 . LATERALITY INDICES FOR THE SINGLE-RESPONSE PARADIGM 

• ^ • ^ Dominance and S t imu 1 us Dominance • - . 

In this section, we make the simplifying assumption that each stimulus 
pair in a test using fused stimuli yields only two kinds of (single) responses, 
one that matches the stimulu^ presented to the left ear, and one that matches 
the stimulus in the right ear. One example, al ready ment ioned in th^e preceding 
section, is the pair /b-af'-'/t^^/ , which" is heard as 'either /ha/ c/r /cla/ - (For 
other examples, see Section 4.5.) Thus, the responses can- be divided into those 
r^i fl ec t ing perceptual dominance of the left-ear stimulus and those reflecting 
perceptual dominance of the right-ear stimulus. Taking i'nto account the two 
possible .channel/ear assignments of the stimuli, the data for a single stimulus^ 
pair can then be represent in a 2 x 2 table, at illustrated in Table 2. The 
two different channel/ear assignments of the stimuli constitute the rows of 
this table, and the two resppnses the columns. The .entries are the proportions 
. of the^ two responses for eac^ of the two channel configurations. 



TABLE 2: 


Thr d at--^ 
pa rad igm' 


cructure; for a single stimulus 
with sample values. 


pair in the 


single-response 




V Ch anneals 
LE RE 


i ' Responses 
i /ba/ 


/c,i/ 

\ 

\ 






/ha/ - 


/da/ 


X- = 0.276 1 - 


xi = 0.72A . 


1 .000 




/da/ - 


/ba/ 


I Vi = 0.A87 1 - 


yi^ = 0.513 


i .0:00 



Perceptual dominance is 
both responses will, occur wi 
^.trials. There are two inde 



a probabilistic phenomenon , so that'^ in • general , 
th some frequency over a number of single-response 
pendents factors that determine which of the two 



competing ;'stimul i >domiDa te s the perception of the fused syllable at a giver. 
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time. Oni?,is- the trn(ieni:y of (the stimulus in) one var t:o doniin/ile (the 
stimulus iW) ' the other ear. It is appropriately called cur dom inane (^'^ and, ol 
course, i s\\' analogous* to the ear advantage observed in th(> t wo-r e spons(' 
paradigm. The ofher f.''etof is the* tendency of on(^ ^stimulus to doniinat^* tlie 
'ocher stimulus, r er .M-d 1 e s s ot thiMr particular clianno'l assignment. It may be 
called St imulus dominance and constitutes an important phenomenon in its own 
right (Repp, 197^b) . 

The two factors are illustrated by 'the f i c t i t ious , u r c a in Table 2. Ear 
dominance is reflected in th,e difference between tbe averagers of the diagonal 
entries in the 2 x 2. table. In the present example, there is a right-ear 
dominance: ( 72.4 + 48.7 )/2 ^ 60.5 percent of the ' re spouses went to the right 
ear and only (27.6 + 51..3)/2 ='39.5 percent to the left ear. At the same time, 
there is a pronounced stimulus dominance .effect, which is "re f 1 ected in the 
difference between the column averages: /da/ was 'leard in (72.4 + S1.3)/2 = 
61.8 percent of the trials; /ha/ only in ( 2 :j . 6 +- 48-.-7-)-/2 -=^--38-.-2--i>ereen t 

•• » . " 

Tr^ should be emphasize.! "hat the information about ear and stimulus 

dotnin iru • contained only the complete 2x2 contingency table but. not in 
its ' , ivi.Ijal rows_. ^ The two different channel assignments of a particular 
stjm s pair must dl^ays ^e considered together; otherwise, the results can be 
\\. \.Jl'n)fi. 1p aable 2, fov example, /da/^/ba/ (with /ha/ in the riy,ht 

e.a a slight ;.LEA, while /ba/- /da/ (with /da/ in "'the right ear) shows .{ 

;e »REA, Sac{^^ a result can appear puzzling if it is interpreted without 
v*:. .^s of the joi'ht opera L n of two factors, ear dominance and -stimulus 
dorr.! n.ir.. .. vcf.. Speaks, Niccum, Carney, and Marble, 1 975 ; Niccum, Speaks, and 
Cr.vno-: ^ 1?76.^. In fart, the right -ear dominance underlying these data is 
c-^^^zlled by stimulus don mu nee in the pair /da/-/ba/, and it is augmented by 
St* doniinance in : he pair /bj/-/da/. Neither case in i so^lat ion revea 1 s 

' ' c • actual size of, the REA which lies bei ve-en 'he>'e ^extremes ar|d mus-t -be 
^i,^'e:rred from the ''complete cn ringency tr\- Likewise, an appropriate esti- 

TT^ate of stimulus dominance in an individual^ stimulus conibinat-ion can only be 
derived from the' complete table- 



In order to avoid new' acronyms, the abbreviations REA and LEA will - be 
maintained for rhe corresponding trends in ear dominance. 



It may be argued that stimulus dom^^nance reflects merely response bias, that 
is, a St imulus- i ndependent tendency of listeners to give one. response more 
often than the other, Howev er, stimulus dominance relationships can be 
changed by modifying "t'Tle acoiistic. strructure of the stimuli within phonetic 
.^ategor' »s (Repp, 1976b, 1977b). SjO- that they are at least in part stimulus- 
dt^pe nden t . Re pp ( 1 9.76b ) hypo the s i. zed t ha t s t imu 1 us d ominance is compl e t e 1 y' 
determined by the relationship of the stimuli to the* listener's perceptual 
category prototypes. Essentially, this is a theory of response bias- 
Stimulus dominance may l-e r on s i d ered asrthe result of the interaction between 
the listener's per cep^ .: H • organ-i za t ion and the s t rue t ure^ o f the s t imu 1 i . 
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Table 2 bears a close resemblance to the 2x2 contingency table for the 
two-response paradigm (Table 1). However, in Table . 1 , the dimensions were 
'left/right ear and cor rec t / inc orrec t responses. The analogy becomes closer 
if one response in Table 2 is arbitrarily considered as "correct" (for 
example, /ha/) and the other as "incorrect" (for example, /da/). The rows of 
the two tables- remain incompatible; however, in Table 1, they represent the 
individual component stimuli in each ear, while in Table 2 they represent the 
two possible channel/ear assignments of both component stimuli. Ir the two- 
response paradigm, it is- easy to summarize the responses to all stimulus 
pairs in a single table; in fact, it is standard procedure to do so, and the 
data are rarely broken down to the level of individual st imulus pa irs . 
Basically, each individual channel assignment of each stimulus pair yields 
its own 2x2 table (of the form shown in Table 1), and these tables are then 
simply added up or averaged. This presents no problem, because -each stimulus 
,pair yields .left-ear and right-ear as well as correct and incorrect res- 
ponses. In the single-response paradigm, on the other hand, the 2x2 tables 
for the individual stimulus pairs are not commensurate--their rows and 
columns fiave different labels in each case--and therefore cannot simply be 
added up or averaged. Even if we stipulate that the positive diagonal always 
contain right-ear responses and the negative diagonal left-ear responses (as 
in Table 2), there remains one degree of freedom for the arrangement, of the 
table. We show now how, this problem can be solved. 

• 2 • The e Index f or the Single-Response Parad igm 

The problem now at hand is how to compute 'an appropriate laterality 
ind^x'^ for a whole single-response test. It is easy to compute ear dominance 
incjices for individual stimulus pairs. Despite the different nature of the 

'-^'Titrres in Table 1 and Table 2, ; the structure of the data is almost 
completely identical in the two cases, and most of the discussion of Section 
1 a;pplies. In particular., the factor of stimulus dominance exacts the same 

'Constraints here as. the factor of performance level in the two-response 
paradigm. 50/50 distribution of responses here is analogous to a 50 

percent performance level there. The simple difference index, d{ = y^ ~ ^i j 
is unsatisfactory for the same reason that d is unsatisfactory in the two- 
response paradigm. (The subscript i indicates that we are dealing with a 
single stimulus pair.") Clearly, the best choice is- 

(21) ' = (y— x- )/(yi + xi) if (yi + xi)/2 £ 0.5 

= (yi - xi)/(2 - yi - xi) if (y^ + xi)/2 .0.5 . 

Since the arrangement of the 2x2 data table is arbitrary, the 
convention of tabulating the less frequent response in the left column (as in 
Table 2) may be adopted, so that the first condition always holds and, - 

(22> ei = (yi - Xi3/(yi x^) . 

ITius, a laterality index can be computed for each individual stimulus 
pair- The most straightforward way of arriving at an index for the whole 
test would then be to take the average of all the e^ indicths. However, these 
indices vary considerably in their precision, depending on how much stimulus, 
dominance deviates from equilibrium. The e\ indices are most reliable when 
the two stimuli are in equilibrium, and . they become more "variable and 
unreliable as the relative dominance of one or the other stimulus increases. 




This follows strnight forward 1 y from s t t, i s t i 1 arguments. Th ere fore e j- 
indices- for stimulus pairs with very c'l symme t r ic a I response distributions 
should receive less weight than indices for stimulus pairs with more nearly 
symmetrical response distributions. The degree of asymmetry is represented 
by the proportion of the. less frequent of the two response -^.j w- = (y- + 
Xi)/2, which is the appropriate weight to be assigned to each The 
overall E index as the weighted average of the e[ indices is then computed, 

(23) E = I wiei/I Wi = ' • 

= (1/2) I Kyi .+ xi)(yi - ^i)/(yi + Xi) 1/(1/2) ,1 (yi + Xi) = 
= T ^yi " ^0/ J (yi + Xi) = e . 

Thus , the resul t turns out to be ident ical with the e index computed 
from .a simimary 2 x- 2 table. for the whole test. We note that, by adopting the 
convent-ions of tabulating' the less frequent response in the- left column and 
right-ear responses. in the positive diagonal, we have fixed the format of the 
data tables, so- that they can now b.e . added up or averaged in a nonarbitrary 
way. The e index computed from this summary table is then " ident iccl with the 
weighted average of the ei indices for the in4ividual stimulus pairs. 

c The variance of the ei indices provides us with an estimate of whether 
the overall e index is significantly different from zerov Assuming that the 
e,| indices are approximately normal 1 y d is t r ibut ed' around zero if the null 
hypothesis is true, we make use of the well-known relation that the estimated 
variance of the mean ii the sample variance divided by the q^umber of 
observa t ions , 

(24) s2(e) = s2^(ei)/N , 

where N is the number of stimulus p.-irs. The subscript w indicates that, 
again, we would like to assign more weight to the deviations of the more 
reliable indices from the mean than to the deviations of unrel iabl e . ind ice s . 
We thus compute the weighted variance of the ei indices as 

(25) s^w^e,-) = I Wi(ei - e)2/I wi = 

= -y llvl - xi)^/(yi + xJ)7n-(Tt-+^^^^^^^^ 

^ - iZ (yi - xi)/r (yi + xi)]2 , . 

Confidence limits for e can then be estimated by e ;;^ 2sCe)- If they do not 
include zero, e is significant at approximately p < .05 . 

3.3. The ,e ' Index ' 



The e index will be useful as long as the distribution of the.ei indices 
is roughly symmetrical- With very asymmetric distributions, however, an 
arithmetic mean is not the optimal measure. There is an alternative method 
available that also permits an approximate graphical . deteirrainat ion of the 
laterality index. This method uses the basic concepts of signal detection 
theory that have already been referred to in Section 1. In additf^n, it 
provides a direct test of the assumptions underlying the e index. The 
procedure IS illustrated in Figure A using some actual data from a recent 
experiment by Repp (1977a). 
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Attention is again restricted to the less frequent responses only, tha.t 
is, to the leg^columns of the N data tables for the individual stimalus 
combinations.,/^ The entry in the top row, , represents left-ear responses, 
or "false Laarms The entry in the bottom row, yi , represents right--ear 
responses, or "hits." We then plot yi--or p(H)^ tho hit probability—against 
x-~or p(FA), the false alarm probab i 1 i ty--f,or all s timulus^ pairs , This 
results in a swarm of points located on or below the negative diagonal of the 
uP.it square. (Therefore* only its 1 ower ^ t r i angul ar portion is -shown m 
Figure 4.) To these points, a receiver' operating characteristic (ROC) 
function may be fitted. The standard ROC function* is curvil inear , • but for 
our purposes little accuracy is lost by simply fitting a 1 inear func t ion . A 
straight line through the origin and the data points may be fitted by eye, 
or, more precisely, by the method of least squares. The slope b of this line 
will range from infinity (perfect REA) to 0^ (perfect LEA). In order to 
convert this range to the standard scale from +1 to -1, we define 

(26) e' = (b - l)/'(b + 1) . . ' 

This value can also be read off a linear scale on the negative diagonal, as 
illustrated in Figure 4. The triangles are the average results of eight 
subjects, while the circles are for a single experienced listener (myself) 
who showed an especially large RHA'f Based on 24 data points (stimulus pairs) 
in each case, the e' coefficients are +0.55 and'+0,96, respectively. 

e' may be directly calculated as . ' 

(27) e' = tanl (l/2)arctanl ( I Vi'^ - I Xi'^)72 I Xiy-J] , ' 

which effectively is a rotation of the best-fitting line into the jl:45 degrees 
sector, so that its slope (the tangent) ranges from -^1 to -I, 

The e' index is an unbiased measure in terms of signal detection theory, 
since it is. a. simple linear transformation of the area under the ROC 
function, a commonly used measure of sensitivity that is independent of any 
particular assumptions ' about the internal representations of the sensory 
events (Green and Swets, 1966; Richardson, 1972). Testing the significance 
of e' is not straightforward, so that one may rely on tjhe e approximation 
(Equation 22) for this purpose. 



TABLE 3: Ear advantages on the voicing dimension. Data of eight subjects 



from Repp (1977a). 



Sub j ec t s 



e 



e 



s(e) 



JK 
JL 
RG 
MR 
GG 
WT 
TJ 
CW 



+0.17 
+0.73 
+0.89 
+0.57 
-0.09. 
+0.90 
+0.47 
+0.75 



+0.16 
+0.7 2 
+0.89 
+0i49 
-0.12 
+0.88 
+0 .44 
+0.74 



0.06 
0.06 
0.02 
0,10 
0.08 
0.04 
0.07 
0.06 



• The o' index is iisunUy well appi-oximc'iL ed by o. Table 3 pref^ents o' and 

e coefficients, together with sCe) , for the eight subjects in Repp's (1977a) 

study. It can be seen that e is generally very close to e'; the largest 

deviation occurs lor subject MR, who, in fact, showed a highly asymmetrical 

distribution o*f ej indices. By the l2s criterion, all coefficients except 

that- for subject GG (the only case of left-ear dominance) are significant. 

It should be noted that s(e) becomes constrained as c approaches il 

(cf. subjects RG and WT in Table 3), so that it should not be '^used for 

testing whether two coefficients are s i gn i f icah t 1 y • d i f feren t f rom each other. 

A nonparametric test may be used for this purpose- 
si * 

One important difference between the present procedure of deriving, e' 
and the signal detection paradigm should be pointed out. In the latter, 
, "bias" is varied by means of instructions, payoffs, etc., while the stimuli 
for which sensitivity is being measured are held constant. If the stimuli 
(for example.,, signal and/or noise levels) were to be changed, the listener's 
sensitivity would change, ^too. In the present ^case, stimulus dominance takes 
the role of bias, and ear dominance that of sensitivity. However, in order 
to change stimulus dominance, the stimuli themselves are varied. Thus, it is 
assumed that ear dominance is independent of the nature of the stimuli, at 
least within a given class (such as initial stop consonants). The validity 
of this assumption is an empirical question. It is especially convenient 
that d^termini,ng e' for a set of data at the same time provides a test of its 
underlying assumptions: if the linear ROC* f unc t.i on • f i t s the data poorly, a 
different function and" a different index may have "^to be chosen . So far the 
results have been encouraging.^^ Moreover , rho correction for guessj.ng is 
needed for e' since, in general, guessing plays only a small role in the 
single-response paradigm. However, the single-response paradigm is not 
without its own problems. The last section discusses a number of methjdolog- 
ic^al issues and problems so far not cons :^d^ered . * 

4. PROBLEMS IN MEASURING THE DICHOTIC EAR ADVANTAGE 

A , 1 .. St imulus Intel! igibility 

It is good practice to precede a dichotic test with a series of binaural 
(or monaural) stimuli, in order to familiarize the Fistener with their sound 
and to find out whether they can be reliably identified. In order to obtain 
useful dichotic data , the stimuli must be intelligible' and yield high 
binaural (or monaOral) identification scores. ^ 

. This goal is more easily achieved with natural speech stimuli than with 
synthetic speech. However, synthetic stimuli are desirable because their * 
acoustic properties can be controlled by the experimenter. Therefore, it is 
advisable to use a good set of synrhetic stimuli that has been pretested for 
i nt e 1 1 ig ib i 1 it y--a point that has often been neglected in the past. 

Even wh'en the average intelligibility of a set of syllables is high, 
their intelligibility should be tested, for each individual subject in a given 



^^Repp , B. H. Stimulus dominarrce and ear dominance in the perception of- 
dichotic voicing contrasts. (submitted for publication). 
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tpst. • From Lime t^o Limo, i "\iciunl.^ nre oncounuored s'ho finti it vcn'y 
difficult, to idontify ^^yntluMii ^ppoch so\'inds, Such individuals may hove to 
be excused from the te.^t . ( Th ■. s an obvious problem in clinicnl applicd- 
t ions of d ichot ic test y . ) 

Intelligibility is usually 'assessed in terms of thin confusions th«'lt' 
occur bet\;een members of a stimulus set. The information obtained from a 
monaural confusion matrix may be used to apply a correction to dichotic data 
that leads to' a better estimate of the stimulus dominance relationships 
between the stimuli (Repp, 1976b). Unfortunately, however, information about 
ear dominance cannot be recovered in this f a sh i on--c on f usab le stimuli yield 
smaller ear advantages than n oncon f usab le stimuli (Repp, 1977a). Since this 
effect may be confounded with ind i vidua 1 d i f ference s in confusion patterns, 
it\ is advisable to omit confusable pi:irs when calculating ear advantage 
indices for individual subjects. ^ >. 

Problems arising from con fusabi.l ity of certain stimuli may also be 
reduced by using a dichotic listening' procedure that does not require a 
labeling response. Only one- such alternative is mentioned here, originally 
proposed by Preston, Yeni-Komshian and Benson (1968), and especially suited 
for fused stimuli.:. the "wo component stimuli are presented, binaurally, 
followed by the dichotic pair, and the listener judges whether the dichotic 
stimulus was more similar, to the first or the second binaural stimulus. 1 am 
currently experimenting with an AXB version of this ABX paradigm, tKat is, 
with the dichotic paii in the middle of each stimulus triad (cf. Repp, 
1976a). This method may yield cleaner data than the single-response identi- 
fication task, but it is more time-consuming. 

*^ The intel 1 igib i 1 it y-conf usab I'i it y, issue raises an important theoretical 
problem. Individual d i f ferences- in the perception of s t xmul i ( e spec ic»l 1 y of 

"synthetic syllables) are large, and "poor subjects" who • produce many confu- 
sions will tend to have smaller ear advantages than '^good subjects". The 
individual differences that thus confound the measure of the e.ar advantage 
may be ascribed to different levels of "internal noise" in the 1 i s t ener s ' 
perce pt ua 1 'Sy.s tems . Now suppose we have succeeded in generating an excellent 
stimulus set S:hat produces no confusions at all- Have we eliminated the 
individual differences? Overtly, yes; but if the stimuli were attenuated or 
mixed with white^noise, some sub j ec t s probab ly would produce more confusions 
than others. Also, if tested wi th an acoust ic stimulus- continuum as used in 
categorical-perception studies, some subjects would h,?ve sharper category 
boundari'es than others. Again, this may be ascribed to luJ: ^'dual differ- 
ences in internal noise, level — most most likely the same differences that are 
evident wi th c on f usab le s t imu 1 i . 

Given such individual differences in perceptual accuracy, it is likely 
that they play a role in the perception of dichotic stimuli ,,A fused 
dichotic syllable pair is often qui te ambiguous , and an unfused stimulus pair 
is often degraded through mutual interference between the two stimuli. The 
problem is b'est illustrated with a fused pair, for example, /da/-/ga/, as 
shown in Figure 5. Assuming perfect in t e 1 1 ig ib i 1 t y of the component stimuli 
and no pronounced stimulus dominance effect, this dichotic pair sometimes 
sounds like /da/ and sometimes' like /ga/ . ("because of categorical percep- 
tion, the subject may often not be aware of the inherent ambiguity o.f the 
syl'lable.) For individuals with a REA, the dichotic pair sounds a little more 
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(oft(Mi) lik(^ /cla/ wIkmi /di\/ v in tlu*' right onr, /md :\ I i I: t J (\ i]U)r(^ (oltt-'ii) 
like /ga/ when / / in the right e/ir.' Ttiiis , r.l\6 two lusocl stimuli inny bo 

considered cis lyine <mi n /da/-/ga/ continuum, a Tittle to the left and a 
little t': the right of the c^Jtegory boundary, respectively. A "good subject" 
with a low internal noise levol has a sl\arp category boundary and thus 
resolves the t?wo dichotic stimuli well; he or she will show a clear KKA'. A 
"poor subject", on the other hand, with the same underlying REA as tlie good 
subject, is likely to have a. flatter psychometric function separating tlie two 
categories and , as a result, will produce similar response distributions for 
the two dichotic pairs and a much smaller REA. This is schematically 
illustrated in Figure 5, 

If this ar^^ument is correct, it implies that individual differences in 
the dichotic ear advantage may be inextricably conf-ound^ed with indi^'idual *- 
differences in internal noise level- This would be a serious obstacle to 
measuring individual ear advantages on an ordinal scale. 

•This problem seems to be less acute in the two- response paradigm^ there, 
variations in perceptual accuracy- are translated primarily into variations in 
performance level that can be dealt with more easily. However, this apparent 
advantage of the two-response paradigm is offset by a number of disadvantages 
trhat are discussed in the next paragraphs. 

4.2. St imu 1 us Dominance — ^~ — — 

The factor of stimulus dominance can be dealt witli elegantly in the 
^single-response paradigm, thanks to the analogy with performance level in the 
two-response paradigm. However, this analogy is purely f ormal-^these are the 
' factors that can be effectively handled in the respect^ive paradigms by using 
similar methods--but they are conceptually very different. As the preceding 
paragraphs have shown, presumably there ere variations in "performance level" 
(th^atr is/ perceptual accuracy) in the single-response, [ adigm, but they are 
covert and much more difficult to deal with. Correspondingly, there is the 
^^^^roblem of how to dealHwith stimulus domi.iance in the two-response paradigm. 

Although stimulus dominance may be expected to play a sma 1 1 er ro 1 e in 
the two-response paradigm, there is evidence that it is nevertheless present. 
Berlin et al . ( 1973 ), for ex<imple , have reported that unfused na t era 1 -s peecH 
syllable pairs that contrast in voicing receive more cor-;ect voiceless 
responses than correct voiced responses. Berlin et al - reduced this 
asymmetry by aligning the stimuli at the first pitch pulse rather than at 
stimulus onset. Speaks ef^al. ( 1975), who used the same alignment c4aiterion, 
reported data suggesting that stimulus dominance effects are of minor 
^ importan^^e with natural-speech stimuli- However, this issue needs to be 
investigated in more detail. There i^^'J\o doubt that strong stimulus 
domi.iance reduces the manifest ear advantage, and if some . i nd i v idua 1 J; show 
stronger effects than others, these individual differences are confounded 
: with the measure of the ear advantage. At present, .1 .know no way of dealing 
with this potential problem. 
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Fig^e 5; Schematic illustration of the effect that . tnd iv idua 1 differences 
in perceptual acuity Might .nave on ear dominanc^^ for fused 
syl lables . ' , , • * 
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* • - • Ciiu' S.s 1 n^^ . . ^ 

Gm^ssinj> plviyt; an i n s i v.n i t aiU role in the s i n^TTo^^responso pdrndigm. 
Random gue.^5«o« following. KTpse.«i of attention or l\igl\ly am> , loiis stimuli u\C]y 
occur now and then, but, im genernl, the listener* reports whcit he or she 
heiirs and does not resort to guessin^^ (except for '*so ph i s t ix<i t ed*' guessing 
between n very limiLc'd numhor o cj 1 1 e rn/U i ves ) . In the two-response^ para- 
digm, on the other ^hnnd, guess'.Tg is common p 1 ^ice . Frequently a listener can 
identify the s.t^imulus n one ear but has no clues about the stimulus in the 
other i^ir. The resulting giv'sses cannot be reliably identified in the data 
and lead to a considerable amount of random .variation. The ,t orre'c't ion for 
guessing proposed in Section 1.2 is a ratlier crude procedure, and alternative* 
ways of dealing with the guessing problem should be considered/ 

One obvious ,poss ib il ity is to instruct the listeners not to guess, that 
IS, to give ^t?roj one, or two responses per stimulus pair, depending on how 
many stimuli he or she heard clearly. Tliis method has rarely been used^, 
because the -esulting heterogeneous protocols are difficult to analyze. More 
/common instructions have been to write dcfwn t he "more^ con f ident response first 
and to analyze only these responses. Effectively", this is the single- 
response paradigm applied to un fused stimuli. (The second response might 
just as well be omitted,) If both stimuli can be identified, it amounts to a 
judgment which ui them is the moi e salient. Tlris procedure is interesting 
because it reduces guessing and permits the methods of Section, 3 to bo 
applied so ti^nt stimulus dominance can he taken into account. The main 
problem is the control of selective attention, discussed in the next 
paragraphs. 

. ^+ . Se 1 ect i ve At tent ion 

The most import an t" d i f f erence between fused and unfused syllables lies 
in the effect of selective attention. Perfectly fi4;;ed syllables are heard as 
originating in the middle of tljp head, and voluntary efforts to pa^y attention 
to- one ear has no effect on the responses (Pepp, 1976b). The effect of 
selective attention with' partially f used ^y 1 1 ab 1 es appears to be very small, 
although this issue deserves : further investigation (see Halwes, 1969; Repp, 
1976a). Unfused syl lables^/' on the c\ther hand, yield large attentional 
effects, and practiced listeners aire able to reach almost perfect scores wh^n 
reporting only. the syllablco.in one ear (Halwes, 1969). It is fair to say 
that the e f f ec t i vene ss o f selective attention is a direct function of the 
degree of fusion of two stimuli (cf. also Footnote 2). * 

It follows that, with untueed syllables, it is not possible to separate 
attentional preferences, e f fee t i vehe s^s , or bias from tlie ear advantage per se 
that presumably has a physiological basis. Some researchers have hypothe- 
sized that the ear advantage is entirely an attencional phenomenon (Kins- 
bourne, 1973 , -1-975 ; Morris and -Land-fircy , 1977 ) or a perceptual advant^e for 
stimuli localized to the right of the midline (Morais and Be rt e 1 son , ' 1 9 73 , ^1^ 
1975; Morais, 1975; Hublet, Moraio, and Bertelson, 1976). The fact that a 
REA is obtained for perfectly fused syllables in the absence of any . 
attentional effects (Repp, 1976b, 1976c)' suggests that there are both 
physiological ^nd attentioijal components of the ear advantage. Perfectly 
.fused syllables may yield an estimate of fhe ph ys i o J og i c a 1 :omponen t' a 1 one , 
with the attentional jromponent removed. This makes tests us ing^-^f used 
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syl l;ibh;5» such promising; i ns t i unuMi t s • Willi iin I u.mnl syl fiihlns, phy .s i ii 1 (>).\ i twi 1 
nnd .Tttont ion:il Hltn-is art? con i oinultnl • 

In fairnoss, oiu» should il i s L i n^ui i sl'i two kin Is ot tonl iouci 1 rlli^cLs: 
auf.omij t'ic and rtraLogic binsi^s. AuLoin.Mtic binsos mny arise; from contingon- 
cics and cxpcctanc ief?- wi tliiotht^ exprunmont/il situntjon; for ox^Jinplo, during 
or after procotsing a verba iVtimu I us , the -left hemisphere may be activated 
more than the right, I end ing JtjtA an automatic bias for stimuli on the -right 
side. These, kinds ct involuntarV biases are^what Kinsbourne and Morais have 
in mind. The REA for unfused jsyllables apparently can be influenced by 
contextuci.l factors (Goldstein'and Lackner, 197A; Morais and Landercy . 1977 ); 
whether the same is true withlfused syllables remains to be investigated, * As 
far as individual differences are conceriTed , automatic attentional effects 
are difficult to. d is t inguish from the phy s io log ic a 1 or funct ional asymmetry 
itself; they arei. probab ly highly correlafed. Strategic biases, on the other 
hand', are voluntary §nd at the disposition of the lis*;ener- For example, by 
del iberat^?ly paying attention to the left ear, even persons with a rtrong REA 
can produce a LEA with unfused' stimuli. Such , strategies are not under 
control in the standard t-wo-'response paradigm, so that the ear advantages 
obtained .',re not a pure leasure of lateral asymmetry. 

This is especially obvious in the single-response parad igm. when applied' 
fto unfusG;d stimuli- It does not sifffic» to instruct the Listeners not to pay 
attention to either ear; especially inexperienced listeners may not follow 
these i ns trucl.ions , and there is no way of controlling whether they do- It 
may b'e difficult in principle to ^"neutralize" attention.- Requiring 'two 
responses at least reduces the effect that attentional biases would have in 
the single-response paradigm. There remains the poss ib i 1 i t y ^, of controlling 
the listefier*s strategies by instructions to- pay attention to one or the 
other ear and to report only the stimuli in .that ear. This procedure has- 
been followed by several researchers, although usually not for^v'the purpose of 
assessing ear advantages. It may be considered as a two-response paradigm in 
two passes; in this case, a single ear advantage index would be' computed 
after combining the results of the two (properly countrjrbalance,d) selective- 
attention conditions. The problem is that here, because' of the. relative 
efficiency of selective attention, performance lev.el will be rather high, 
making the ear advantage' index less reliable.,^ Alternatively, the two 
■selective-attention conditions irtay be considered as single-response para- 
digms, and two separate single-response indices may be computed whose 
difference is then taker as the measurf of the ear advantage. However, here 
''we encounter the same problem as with the d 'j.ndex: s iraple d i f f erence s depend 
on the absolute size of the numbers involved, so that the resulting index* 
reflects individual differences m the relative effectiveness of selective 
attention in addition to the ear advantage itself- Regardless of the form, of 
data analysis, there is the theoretical possibility that there are lateral 
asymmetries, in the effectiveness of voluntary selective attention that are 
independent of the ear advantage itself and again would confound the measure 
of the .ear advantage. 

We conclude that there is no perfert way of controlling "attentional 
strategies with uniused stimuli, so that fused stimuli offer a signif^icant 
methodological advantage' in this respe.ct. Future research will determine 
whether the relatively small ear advantages obtained with perfectly fused 
syllables are the relitively "^ure" measlire of physiological asymmetry that I 



^ ' • iilH^ ^^'^ ponsf^ r> 

In the discussion ol the s i n^* 1 e^-r ospou so p/MMclip,m (St^ctton O, it wm. 
assumed th<it only two kinds of rt^sponsos nro givoVi to n f usrd <1 U'ho^ i c p.n' r ; 
they match one of Lhe two component stimuli nnd cnn bo ^f^si^^rud to ono or iUr 
other ear. Of the flftet?n possiMe combin«tion6 of flie six stnnd;ird s»"op- 
consonant-vowel syl lab les , on 1 y seven meet this strict criterion, p,iven th.it 
ihey are highly intelligible, in i.solation. * Tl-^ese pairs are the place ^ 
contrasts /ba/-/da/ , /dri/-/g'a/ , /paZ-'/ta/ and /ta/'/ka/-, and the voicing 
contrasts /ba/-/pa/, /da/-/ta/ and /ga/-/ka/ , These ".r^^ the stimirlus pairs 
especially suited for the methodology outlined In SectiojrT^J. 

^ However, it may be des irab 1 e or some purpose t'^ include other stimulus 
d.ombina t i^jv^ as well, 'and past expei'iments have almost a'lways included all 
poi^sible colnbinat ions of the stimuli. The two place contrasts, /Ha/-/ga/ and 
/•pa/-/ka/, may ' receive a third response, /da/ 'and /ta/, respectively. 
Cutting' (1976) 'has called these intermediate percei^^ts *' psych oac oust ic fu-** 
sions"." Their frequency may be negligible for'.many listeners, but some" give c: 
substantial proportion of these responses (Repp, 1976b). 'lite remaining 
.stimulus combinations are the six d oub 1 e- f ea t ure contrasts: /ba/-/ta/, /ba/- 
/ka/ , /da/-/pa/, /da/-/ka/ , /ga/-/pa/ and /ga/-/^a/. Thd^ typically yield 
two additional responses per. pair, resulting fiom the combination of "-tlie 
feature values of the component stimuli; for example, /ba/-/ta/ is heard not 
only as /ha/ or /ta/, but also as /pa/ and /.da/, Thet^e "blend" responses are 
usually quite tipquent acid may even exceed the proportions correct 
responses, although there is much* variation between stimulus pairs and 
subjects in this respect (Halwes, 1969; Repp, 1977a)^' Blend responses and 
psychoacous t ic fusions usually do not convey d irec t / iA format i on about ear 
asymmetries, so that the question arises what* to do with them. 

•* 

Hybrid responses also occur with unfused syllables (Halwes, 1969; 
Studdert-Kennedy and Shankweiler, 1970).' In the two-{re s pon se paradigm, the> 
axe simply grouped together with other types of errors in th*e class ol 
incorrect responses. As a result, d oub 1 e- f ea ture contrasts typically have 
higher error rates than single-feature contrasts,, an ^effect that has been 
t^rmed^ the "feature-sharing advantage" (Studdert-Kennedy and Shankweiler, 
1>70; Studdert-Kennedy et al., 1972 ; ^Pisoni , 1975). The ava i 1 ab i 1 i c y* of the 
correc t-incorrec" d i s t inc t ion ^make s it easy to dispose of blerids in the two- 
res ponseparadigm. 

In the s mgle-response^ parad igir , on the other hand, we have assumed that 
a 1 1 responses" are "correc t ," apart perhaps from a few random, errors (that may 
be divided up between the two response categories). Blend responses are 
different from^random errors in that they reflect what the listener actually 
heard; in a sense, they are correct responses. However, t;hey cannot be 
unambiguously assigned to one or the other ear. There are two ways, of 
dealing with them. One possibility, followed by. Repp (1977a)^/is to analye " 
tSyhe data ip. t^.rms of the individual phonetic features ar.d to calculate two 
laterality indices, one for voicing and one for place- It only r ,^e feature 
is considered at a time, blend responses become informat iva with respect to 
lateral asymmetries- The two resulting indic?s may be av raged to obtain a' 
single index. " ' ' ' 
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Tl\f otln»r posfi i b i 1 i t y , wlrii'h coiKsiMts ot d i m(M nl i n^. >>l<'nil i <v;ipnniJi»H , \u 
move pr oh h'liwit i c . CotisidtM* [ho ii)\\ow\\\y\ ^'Xtiniph^, shown in Tabh' A: 



TAHLK A: Fictitious r<»sponso ti i s I r i l>ut i t>n tor n (KmjI) I t^jt uit* (Mmw rar.t 
piilr. 

St iiiuil i KoHponstJH 

■? LK RK /hn/ ■ /in/ /pn/ /da/ 

/ba/-/tn/ 0 ,33 .33 ,33 

/tn/-/ba/ .5 0 J5 ,25 



Here omilssion of blends (/pa/ and /da/ responses) would lead to the 
conclusion thc^t there is a perfect REA for this stimulus pnii . However, wlien 
the data are analyzed for each feature bepar^tely, it is found that there are 
cycT^^ moderate REAs for voicing \and place (e = +0.46 for both). If blend 
errors are discarded, this information is lost and the REA is inflated. It 
is not clear which should be consid:.9H the correct index: the average of 
the separate indices for the two features or the index based on "correct" 
respoi.ses only. • 

It may be possible to settle the problem by examining empirical 
i solateral iLy contours (ROC functions) for single- and double-feature con- 
trast pairs. In the meantime, double-feature contrasts and pairs yielding 
psychoacoust ic fusions are best omitted from dichotic single-response tests, 
as long as only the ear advantage is of concern. This leaves us with only 
seven of the original . fifteen stimulus pairs — perhaps too few to 
constitute a useful test. However, synthetic stimul' of^er the possibility 
of* varying the acoustic structure of the stimuli while leaving their phonetic 
content unchanged. By varying voice onset time or the formant transitions 
within phonetic categories, stimulus dominance relationship's can be changed, 
so that an e' index ca.n easily be calculated (Repp, 1976b, 1977a). In fact, 
it is possible tb . take a single stimulus pair (for example. /ba/-/pa/), to 
select several tokens with different acous t i c eharac t er i s t ic s (for example, 
four different voice onset times within each category), and thus to arrive at 
a test that contains a sufficient number of stimulus pairs (sixteen combina- 
tions), is maximally homogeneous, and leads to a clean estimate of ear 
domin.ance (see footnote 6a) 7 This illustrates one of . the. great methodologi- 
cal adva'ntages of the single -response paradigm over the two-response 
paradigm; the latter always. requires a larger number of response alternatives 
in order to reduce the effect of guessing. 



'In principle, e' can be calculated without varying stimulus dominance. 
However, varying the stimuli and, with them, stimulus dominance is*, important 
in ord^r to avoid extreme dominance asymmetries due to individual idiosyn- 
crasies, to derive an ROC function, and simply to provide variety in the 



test , 



4.6. lest Reliabilir.y 



In the Introduction, .1 have stressed that dichoti'c tests must ^ati^^V 
..general test-th'eore t ical standards. One. of these is reliability. \^ in 
other psychological test, the observed score (ear advantage) of a ^ubj^^^ 
represents his or her "true'^ score plus random measurement err^^- 
magnitude of the measurement errof depends on the length of the test_ It "^^ 
not surprising that, in repeated administrations of. a short dichotf^^ te^^ i 
the observed ear advantages for A given subject vary considerably ^nd 
even show reversals in direction (Speaks, Niccum, and Carney, 1976). ^M^^t 
dichotic studies in Ihe past have used short tests whose reliabij^jty'**^^^ 
likely to be low. The fact that a certain percentage of r igbi^'^han^ 
subjects show either no REA or an LEA (aKhough physiological data ^^gg^^^ 
that virtually all are 1 e'f t-hemi sphere-dominan t for speech) is at le^^t 
in part, to measurement error (cf. Blumstein, Goodglass, and Tartter^ 1^15^" 

* Ryan and. McNeil ( 1974) have . re por ted a test-retest 'reliability ^:oef^'^'^' 
cient» of +0.80 for a. 60-item test, and Blumstein et avl . ( 1975) f oup^ 
somewhat' lower coefficient of +0.74 for an 80-item test- Xpth studf^^ uf^'^ 
natural-speech CV syllables in the .two-response format. The'feie re 1 i^t,i I it 
are quite satisfactory in view of the re 1 a t ive shor t ne ss of the\^test^ ^nd ^ ^ 
weaknesses of the two-response paradigm (guessing, attentional f 1 uc t^j^ t io^^ > 
ecG.) . Researchers in the field have tended to expect too- much from 0 sb^^^ 
dichotic test and have been reluctant to accept the conclusion m^^?^ 
longer tests will be necessary to obtain precise measurements. . If ^Qjc^^^ 
the Blumstein et al . results as typical and apply the standard Spearn\ari""Bf 
formula (Lord and Novick, 19.68, p. 112), we find that, the test h^g to 
three times as long (about 240 pjirs) to achieve a reliability of +0.9^, ' 
six times as long (about 480 pairs) to reach r = +0.95» From the J^^y^n 
McNeil data, we .obtain more moderate estimates .of ,140 ^nd 280 Pai^^ ^ 
respectively- Considering, t^he fact that the stan.dard set of six CV Syl^ab^^^ 
yields .i basic test..,unit of 30 dichotic pairs, I would recommend 
repetitions of this test unit (that is 300 pairs) be administered or^^^ 
to obtain stable ear advantage in4ices. Such * a test requires ^t?^ut ^ 
minu-tes of listening time and' therefore should be feasible uui^^f in^^^ 
c irc:un3s tance-^ , both in and outside the laboratory. 

, Underlying the development of the single-response methodology is t^^^ 

hope that this procedure ''will proye to be more reliable than the tra^i^io^f^ 
two-response paradigm . Alternative methods, such as the AXB paradig^j^ ni^pt^^"^ 
oned earlier, may also lead to. increased reliability. I plan to c°nd^^^ 
pertinent studies in the near future.^ 

^ * ^ * Homogene ity and Va 1 id i ty c 

The probl'em of test reliability is a practical one that always ^^an 
solved bv using a test of sufficient length. More important f^^,^ 



Extremely encouraging results have been obtained recently by Bruc^ 
(personal communication, 1976). .Using a 60-item test of relatively tJ^^u^^^ 
syllables in a single-response paradigm, Wexler obtained reliabilities w^^^ 
above +0»90 for both normal and' psycho t i c sub j ec t s- - (Se.e also footn^t^ 6^' 



theoretical question of what is actually being measured--Lbe validity of the 
test Ultimately its validity as an instrument for assessing hemispheric 
dominance needs to be assessed by physiological criteria of functional 
lateralization. At .present, however, these physiological measurements are 
still crude and hazardous moreover , they are a less, cruc ial criterion than 
they may seem at first thought . First of all, the only reliable physiologi- 
cal indicator in normal subjects, the Wada test, yields only categorical 
outcomes (left, right, or no dominance), not a graded scale of lateraliza 
tion. Moreover, it really supplies a useful criterion only for the small 
group of left-handers, since it is now well-established that virtually all 
r^'ght-handers are left-hemisphere-dominant for speech. Secondly, the origi 
nal idea that the dichotic ear advantage directly reflects hemispheric 
^dominance for speech is probably an oversimplification. It is likely tbat 
there are multiple fac tors under lying the. dichotic ear advantage, only one of 
which is the (quite possibly all-or-none) dominance of one hemisphere for 
speech. The primary task of the theoretical study of the ear advantage must 
therefore be to determine what is actually being measured . This is a 
difficult problem, but some preliminary steps are possible by asking the 
following familiar t es t- the ore t ical questions: Do all items in a test 
measure the same underlying variable(s)? And do different tests composed of 
items^ from- the same general class (viz., those that tend to yield an average 
REA) measure the same underlying yariable(s)? 

These important (and closely r(?lated) questions about v ?. th in-te st J.or 
item) homogeneity and between-test homogeneity (or validity) have been 
totally ignored in the past- Their .answers are by no means obvious. 
Consider the question of item homogeneity. Re pp ' ( 1 976b ) , for example , found 
that two-fused stimulus pairs of a three-item test yielded REAs but the third 
pair did not. More evidence on this problem is needed- The statistical 
techniques that may be applied ^re intercorre 1 at ion of laterality indices for. 
individual, items in a' test and sybsequent factor analysis, or perhaps an 

''adaptation of the more recent methods of stochastic test theory (Rasch, 1960; 
Lord and Novick, 196§). These analyses should determine whether aU items in 
the test measure a single factor, or whether different items measure 
different factors. The derivation .of an ROC function in the single-response 
paradigm (Section 3.2) also constitutes a (less rigorous) test of item 

""homogeneity. (However, even if it turned out that only a single factor is 
being measured, this would show only that the 'test is homogeneous and all 
items measure the same thing; the single factor may nevertheless represent a 
complex of underlying var i ab 1 e'Sj- ) . 

' . . . • . 

A' rela.ted problem is whether the ear advantages for different phonetic 

features reflect the same underlying factors. In a recent s t udy .o f ^par t ia 1 ly 
fus^d 'dichotic double-f«viture contrasts, I obtained, e' coefficients separate- 
ly far voicing and place; they correlated only +0.64, although each index was 
based on the same 768 trials (Repp, 1977a). I hypothesized that individual 
differences in perceptual organization may be reflected in the dichotic ear 
advantage (see also Section 4.1). Tests of this hypothesis are needed. 

T Finally, the homogeneity question needs to be asked about whole tests: 
Do tests composed of different types of speech stimuli ( CVs , VCs, VCVs , or 
words; stops, fricatives, or nasals; etc.) measure'the same factor? Do tests 
composed of , na t ura 1 -speech syllables measure the s^me factor as synthetic 
tests? Do fused and unfused syllables (or:- the single-response and the two- 
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response paradigm) assess the same factor? Again, int ercorrel at ions between, 
different tests (perhaps supplemented by factoid analysis and modern test/ 
theory) should provide an answer. So far, there are no data, avai 1 able . A 
positive result would, reassure us that we are actually measuring a well- 
defined chara'cTeTist ic whose complexities will have to be unravelled primari- 
ly by physiologists. Negat ive re su 1 1 s , on Che other hand, disastrous as they 
wDuld be for the diagnostic application of dichotic tests, would be of great 
theoretical interest. Perhaps there is more than one '^ear advantage thcit 
^is, different tests may tap different dimensions of a very complex ph)enome- 
non.5 . . / 

4,8. Absolute Magnitude of the Ea r Advantage / 

/' " 

The questions of reliability, homogeneity, and val id ity which are 
correlational in nature, must be kept separate from the issue of/the absolute 
magnitude of the ear advantage. For example, ear advantages, may increase 
with practice (although the evidence appears to" be negat iye--see Porter, 
Troendle and Be rl in , ' 1 9 76 ) , but as long as they do so for all individuals, 
the reliability oi the test will not be affected. Different items in a test, 
may yield different magnitudes of ear advantages, but they nevertheless may 
measure the same underlying variable. Similarly, 'different classes of 
stimuli may yield different average magnitudes of REA and nevertheless 
measure the same thing. As long as all individuals tested are in basically 
the same rank order on each test (or on each item), the homogeneity criterion' 
is satisfied, and it is immaterial which tests or items are selected for 
testing persons, as long as all persons to be compared are tested with the. 
same tests or items.. The vari'.tions in the absolute magnitude of the ear 
advantage represent variations in item or test "d if f icul t y , " - in terms, of test 
theory. It is a separate but nevertheless important question what causes 
th^se variations in d i f f icul t.>^ , if they exist. ^ On the ^other hand, if two 
items or tests yield the same average REA, this implies- absolutely nothing 
about their intercorrelat ion . 

/' ^ . . 

. ^ /•^-- / ■ ■ ■ 

One striking difference iij the magnitude of ear advantages has been 

discovered in recent research us ing 'the /Single-r esponse paradigm (see Repp, 
1976b, M976c, 1977a, and footnolte 6 a ) : ' pa r t ia 1 1 y fused syllables (voicing 
and double-feature contrasts) yi\ld much larger ear advantages than perfectly 
fused syllables (place contrasts)^ This result is methodologically interest- 
ing, because larger ear a.dvahta^e^ are also likely to be moi o reliable. The 
reason for this difference is not clear at pre sent , exce pt that perfect 
fusion seems, to. play a role. The role of selective attention with partially 
fused ,stimul:, needs to be reassessed, although earlier studies suggest that 
it is small ( Ha Iwe s , 1 969 ; Repp, 1976a). Future research will concentrate on 
determining the^ factors that are responsible for this difference between 
fused and partially fused syllables". 



I am. referring here' to tests at the same level of complexity, varying only 
in the auditory and phonetic properties of the stimuli. Thfere is good 
reason t,o believe that dichctic tasks of different complexity tap different 
aspects of lateralizat ion ( Porter and Berlin,. 1973;. 
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A Simple Model. of Response Selection in the Dichotic Two-Response Paradigm 



Bruno H. Repp' 



to 0 

ABSTRACT 

c 

A simple random-guessing model of the dichotic two~.respons«— 
paradigm is described. The model prp^yides a way of calculating an 
index of the ear advantage that -take^", guessing into account. It 
also generates predictions of the proportions of single- and double- 
corroct responses at different performance levels. A comparison 
with real data shows that the oroportions of double-correct re- 
spfbnse? are generally overpred ic ted . By introducing an additional 
parameter reflecting limited channel capacity,, the model can be 
made to fit closer to empirical data, but the value, of the 
parameter is not the same for different sets of data. While this 
model is oversimplified .in many ways, it nevertheless provides a 
rudimentary formal framework for the interpretation of dichotic 
data. 



„ , INTRODUCTION 

Despite a large amount of research and theoretical speculations .on 
dichotic li.stening, 1 it 1 1 e thoughts-has been given: to formulating and testing 
mathematical models of the response processes involved. The present paper 
briefly examines the simplest conceivable formal model and derives s^ome 
pred:j.c tions from it. The model is' almost certainly , an. oversimplification. 
However, the' purpose of t^ i exercise is to point out some basic.' relations 
between several dependent variables in dichoti^ listening experiments. These 
relations are likely to hold up approximately, even if the 'model that 
predicts them .'S not precisely true,' and they need to be taken into account 
in the interpretation of dichotic data. :. 

■V . . • * • 

The present paper serves as..^ an appendix to iny methodological- paper, 
"Measuring Laterality Effects in Dichotic Listening" (Repp, 1977) to which, 
frequent reference will be made. ^ • 

' AN " independent-channels MODEL WITH RAND OM G UESSING 

J- 

In Section 1 of the preceding paper, I discussed the dichotic t wo- 
re spo.ns'e paradigm..- This is the standard procedure that requires the listener 
to identify both stimuli on each trial. The two responses must be differen-t 
from each other and are scored without regard to order.. The proportions of 
correct responses for the right and left ear are' Pf^ and P^ , respectively, and 
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"the overall performance level" is = (Pr + Pl)A2. I described what is 

probably the best index of ear asymmetry (I called it e; it.i*^ basically 
identical with the f index of Marshall, Caplan, and Holmes, 1975), and I 
derived a correction for guessing.. This correction is rather crude,, based on 
linear interpolation between three extreme cases, I pointed" out., that a 
formal model of guessing would provide a more elegant solution. 

The simplest model of response selection in the dichotic two-response 

p a r a d i g m_inake s the follow i n g_t w o as sumptio n^ : ( 1) the stimuli in the two 

ears are perceived independently of each other;. {2) a stimulus is either 
perceived correctly or a rav.dom guess, is made. Although the real situation 
is almost certainly more complex, the 'pred ic t ion s of such a simple model are 
'worth considering. If Pr"'^' and Pl* are' the "true" probabilities of correctly 
perceiving thev^stimuli in the respective ears and N is the number of stimuli 
in the experiment, then . ^ 

(l; Pr = + (1 - Pr*)Pl*11/C-:N - 1)J + (1 - Pr*)C1 - Pl*)(2/N) , and 

(2X Pl = Pl* + CI - Pl*)Pr*11/(N - 1)] + (1 - Pl*)C1 - Pr*)(2/N) . 

The three addit ive 'terms in these equations are: (1) the pro.bab il i t y^ o f 
correctly perceiving the stimulus in the ear concerned ;( 2 ) the probability 
of making a correct guess when the stimulus in- the other ^^ar is correctly 
identified; and. (3; the probability of making a correct guess . when no 
stimulus is perceived correctly. ^ 

■ By taking the d i f f eretice* be tween these two equations, we find that 

(3) d = Pr - Pl = Pr*.- Pl* - li/(N -■ DJCsPr* - Pl''') ' 

= l(N - 2)/(N - 1)'J(Pr? - Pl") . , 

• = l(N - 2)./CN - Djd'-" .. _ . ■ 

Thus, thk observed ear d i f f erence d is in a siipple proportional relationship 
to the underlying ear difference d*.'* The propor t ional i t y factor is identica^A 
with the. largest possible expected d, dj^^^i a given N (Repp,, 1977: Eq . 

15). This becomes obvious by noting, that :f d. = dj^j^^ then necessarily d* = 

^ max ^ ^/^max ~ - 

The numerical solution of Equations 1 and 2 for Pr**^ and- Pl^ is not 
straight forward i so that it wil'l oot. be derived here. (The solution is Jound 
most easily by, a recursive procedtre.) Af ter ' est imates of Pr^ and Pl are 
obtained, an appropriate index of the ear advantage is 

^ " ■ ■ ■ 

(4) ^ e'r = (Pr''' -^L*)/^Pr''' + Pl"") if 1 Po* 1^0.-5 

*. = (Pr'-'^ - Pl*)/(2 - Pr'' - Pl*) ^ if 0-5 1 Po'" l 1.0 . 

Of -course. Equation A is identical with the ' formula for the e index 
before the correction for guessing (Repp, 1.977 ; Eq . 12), except that 
observed scores Pr and Pl are replaced -by under lying probabil i t ies Pr^ and 
Pl" 'that are already corrected for guessing. One might expect eg (the e 
index after the correction for guess ing ' proposed in Repp, 1977: • Eq . 20) to 
be identical with but this is not quite true, as illustrated in Figure 1. 
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Figurerl: Four different indices of the ear advantage as a function of Pq* 
* ' . and d"^. ' ■ 



Figure 1 show^K four laterality indices as a function of two variables: 
and' d'^--the av>5rag-G and .t-b'e difference, respectively, of the two 



un.d.er lying .^^p^.i? Q b a b i 1 i t i e s\P " " a nd Pl'v'J Th e s e are not o laterality contours, 
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which are horizontal straight lines in Figure 1. Rather, each curve 
describes the value of the relevant index for a constant d", so chey are 
" isodi f ference" contv^urs. 



The index e was propooed by Halwes ( 1969) and Marshall et al. ( 1975 ) 
without a correction for guessing (Repp, 1977 :Eq. 12), vhereas Cg incorpo- 
rates the correction for guessing proposed in Repp ( 197 7:Eq. 20). This 
correction has the effect of ben"ding Xhe left parts of the e functions 
( Pq^ < 0.5) upward, so that the functions become U-shaped and nearly symme- 
tric. However, they are not perfectly symmetric, as the e^ functions are. 
The reason for this will become clear in the next sectioti. Here we note only 
that eg and e* are nearly identical, which shqys that the rough correction 
for gue s s i ng . proposed earlier is compatible with the simple guessing model 
discussed: here. Therefore, this correction should suffice for all practical 
purposes, and it generally will not be necessary to actually compute e^ . 

The fourth index, eg^ was not discussed in Repp (1977) but oeserves a 
brief comment here. Studdert-Kennedy and Shankweiler (1970) proposed to 
consider only single-correct trials' for computing an index of the ear 
advantage, since double-correct responses provide no information about ear 
asymmetry. If Ppg and are the proport ions of single-corrects for the two 

ears, and Pg - Ppg + Pls> then Prs/^S constitutes an index of the ear 
advantage; or, alternatively, . ■ ^ 

> ■ ■ . ;k . . . • 

(5) , eg = (PRg - Pls>/Ps > • • - " 

is'an equivalent, index that ranges from -1 to +1. This .index is plotted as a 
function of P^^' in Figure 1, together with. .the other indices discussed 
earlier. The "simple guessing model pro^vides a useful theoretical comparison 
of different indices, q 

.• \ ' 

First, of all, eg obvious ly.„ need s a correction for guet'sring that still 
■ needs to be derived. Secondly ,VAeg is clearly different from ,e,-lead'ing tc 
larger values throughout-. Th i s ■■•■i-s-"^^^ necessarily an atgumenf: against eg; as 
lon^ as two indrces cwe perfecfc'^ly correlated (as they seem to be), one is as 
goo.d as the other tor ordinal measurement. V.\e index eg is ■ based ^ on 
increasingly fewer observations as^ Pq increases, so ..that- its variability 
increases • and its re Mab i 1 i t y./ dec rea se s ; .however, the same ,1-5 tru.e' for 'e . 
Thus, it seems tha t Vvlb'h an Appropriate cor^rection for guessing, ec would l?e 
an acceptable alternative to eg or e . On the other^ hand, however, t:here--is 
no reason why eg should be used instead of e, for which t^^e c',Qrrection for 
guessing has been worked out and which is just as easy • to' compute . Certainly 
and eg indices are not dicectly compa'rable because they represent 
'different sc-ales of the ear advantage. Therefore, to maintain uniformity and 
c.orn^ar ab i 1 i t y from study to study, eg is not recommended for general^use. 



S INGLE- AND DOUBLE-CORRECT RESPONSES ; , 

Both eg and e^ presuppose the validity of the model- outlined in the 
first^ section. It iS' important to determine to which degree thii^ model 
actually fits real data. One way of testing it consists i-n examining its 
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pre'dictions of the propor^tions of double-correct and single-correct responses 
.3t different performance Ifevels. 

The proportion of double-c orre.ct responses, Pq , ^ pred ic ted by the simple 
guessing model is .. 

(6; Pd = Pr*Pl"-" + n - pr*)Pl*11/(n - 1)J + Pr*(1 - PL*^n/(N - i)j + 

s ■ ■■ + Pr*)(1 - Pl*)12/N(N 1)J . 

<[he proportion of single-correct responses, Ps , is 

(7; p = p *(i - p,*)(N - 2)/(N - 1) + Pl*(1 - Pb')(N - 2)/(N - 1) + ■ 

■ ' ^ + (1 - Pr*)(1 - Pl*)(2/N) . 

Alternatively, Pg may be obtained by subtraction: 

(8) Ps = Prs + Pls = (Pr - Pd) + (Pu ^- = Pr + Pl " '^^d • 

Of course, the overall performance level is 
.(9; Po' = CPp + Pl)/2 = Ps/2 + Pd ^ 

Figure 2'shows Pq, Pg , and Pp as a function of Pq* ^or the special ca^^e 
of N = 6. For each dependent variable, two functions are shown. One is 
curvilinear ahid represents the case of no ear advantage, d = 0. The other 
consists:.^..af two linear segments and repr'4senta. the case of - the maximal 
underlying ear. difference ^'t a given P^*, d* = d'^max (given Pq^) ( c f . ^^PP> 
1977: Eq. y) . The functions for cons t an t yalue s o f d^ between C.> and d niax 
(given Pq*) fall between the two extremes shown in Figure 2 and are parallel 
to the curvilinear function. The differences in proportions brought about by 
an increase ir d'" from d^' = 0' is shown in d.etail in Figure 3. Hero it can be 
seen more cliearly that "not only Pj) and Pg'-l. but- also P^ depend on d^ (;5nd, 
hence, on d) , as weU. as on P^*. Thus^., the observed per formance level Pq is 
not completely independent of ttje observed ear- d i f f erence d; ciccording to the 
model/ there is a s 1 ight nega t ive* re 1 a t ion shi p . This is rhe.re/^oou why eg 
and e* do not coincide in Figure 1. Only e*, which is directly b.a^ed on the 
model,, takes the interdepen'dence of Pq and d' into account. However, while 
this effect is interesting from a theoretical viewpoint, it is negligible for 
practical purposes. — ~ 

The Pp . and Pg functions are of special int>erest... From' Figure 2^ it can 
be. seen that, as performance level' increases from chance, both Pp and Pg 
increase at first, but soon Pg begins to decrease rapidly while Pj) continues 
to/increase steadily. Figur^es 2 and 3 permit comparisons with real dat'a. If 
the observed pr opor t ions P^ , d, Pg , and Pq are known, predicted proportions 
Pg and Pj) are found as follows: first, d^ is determined from Equati-on 3j 
then Pq is 'located in Figure 2 on an interpolated funct ion appropriate for d^ 
(the effect *'of d^'^ is' so small bhat' it may be ignored); then, the'^vaTues of Pg 
and PjQf-vfor this Pq are determined dn the ordinate in Figure 2; finally, 'Pg 
and Pj)"are corrected for D-he e f fee t o f d^ by Figure 3 
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Figure 3: .- Pq , Pg , 'and P^ as a function of^,d". 
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TABLE 1 :. 
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A = Studdert-Kennedy aitd Shankweiler ( 1970), 
B - Cullen et aly ( 1974) 
C Porter" et al . ( 19 76) 
D = fobey et.al, (1976) 
E = Berlin et al • ( 1973) 



Note: All studies used natural speech 
preceding /a/ (study A used CVC utterances). 



"and the six stop^ consonants 



Several studies in t.he lite.rature report all the necessary p^arameters 
for ceveral experipientar conditions with different performance levels. .These 
data, and tVie pred ic t ion s • from the model are shown in Table"l. can be seen^ 

that the model overpredipts Pj) and . under pred ic t s Pg in all cases but one. In 
other words, 'the observed proportions of double-correct responses are consis- 
tently smaller th^n pre(i,icted by the model- This indicates, a negative 
dependency between P^^ and Pl^ , such that the • probab i 1 i t y of perceiving the 
St imulus . in onerear correctly is reduced if the stimulus in the. oth*er ear has 
already been perceived correctly. • This eiffect is plausible in view o*f 
factors like fusi9nj selective attention, and memory, all of whi«h tend to 
reduce perceptual accuracy for one channel to the degree that 'they increase 
accuracy tor the other. , *' J 

NONINDEPENDENCE OF CHANNELS . • ' .... 

The model repr.esented by Equations .1 and 2^ assumed that errors in 
'.dichotic performance- arise only from a Very general form of processing 
limitation that reduces accuracy for both ears relative to monaural perfor-- 

- ♦ • * • 
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mance, but permits independent perception of the degraded stimuli in each ear 
Ccf. the ''perceptual noise" hypothesis of Repp, 1 975a, 1975bK However, the 
relatively poor fit of* the model indicates that this assumption is not 
sufficient. • Apparently, thcte is, in addition, a more specific processing 
limitation that makes it difficult .to identify a secbnd stimulus after one 
stimulus has been correctly perceived. (This is one way o f conceptual izmg 
the problem.) This limitation can be easily modelled by- introducing one 
•additio*tial parameter into the model': 'Let us assume that the conditional 
probability of perceiving the stimulus in one ear correctly, given that the 
stimulus in the other ear has already be^n correctly identified, is reduced 
by a multiplicative factor c with reispect to the same probability, given that 
the stimulus in the other ear has not been correctly identified. Thus, 

(10) 'Pr"^. L correct = c?^^ L not correct, and 

(lU Pl* R correct = cPl* R not correct . 

The constant c varies between 0 and 1; c*= 0 indicates^ that, if the 
stimulus in one ear is correctly identified, the other stimulus can never be 
correctly identified except by a random guess; c = 1 indicates complete 
independence of the two channels. The fuTl model, stated in terms of Pj^ and 
Ps, that will now be cal'led P^ ' and Ps ' , respectively, is:' 

(12) . ' Pd' = cPr^'Pl* + iPR^'d - cPl*) + Pl*C1 - cPr"^) + (1 - Pr*)Pl* + 

, + (1 - Pl''')Pr*J/2(N - 1) + (I - Pr'')(1 - Pl'')12/N(N - 1)J , 
and . " .. , — ■ 

(13) ' Ps'.= iPR^'d - cPl^') + PL*d - cPr'''), + (I -^'r''')Pl* ^ ^ 

+ (I - Pl'')Pr"''J UN - 2)/2(N - l)j + (1 - Pr )(1 - Pl'^)(:'-/N)^ , 

In this* version of the model, it makes a difference which ch-annel is 
processed first; this results.^ the additional terms in the equations and in 
the 'additional "2" in the numerator. The simplifying assumption needs to be 

.made that each channel is equally likely to be processed first, so that ear 
differences rest solely on , d i f f erence s between P^"'^ and Pl^ • (Relaxing this 

^assumption would lead to a more complex modiel that cannot be considered 
here . ) ^ . 

The* effect of a -decrease in c from 1 t.owards .0 is best illustrated by 
the differences between Pq and Pj)' and b-etween' Pc; "and -Ps ' • Sub;:ractine 
Equation 12 from Equation "6, one obtains after some rearrangement of terms. 

(14^) Pj) - Pj)* = Pr^'^Pl'^CI - c)(N - 2)/(N - D , . •" 

and subtrac^ting Equation 13 from Equation 7, 

(15) ' Pg.- Ps' = -Pr^Pl'^H - c).(N - 2)/(N - 1) . 

■ !.Thus, 'Pg and Pq change irt. precisely complementary fashion, resulting in. 
a decrease in P. as c decreases (cfir- Equation 9). This is illustrated in- 
Figure A that shows Pq ' 3 Pd ' • ^^S ' ^ function of Pq_" for three value-- 

of c: 0, .5, and 1.- It. is as.sumed that N = 6 and d"''' = Pr""'- Pl" ^ 0- Sir.ce 
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and Tciblp 1), Cbt^ t^ftect ot ear diflerences on those functions mny bo 
neglected with little loss in accurdcy (cf. Figure 3). The functions for c 
= I ir. Figure d arc identical with the curvilinear functions in Figure 2. It 
can be seen that complete nenative dependency between the two channels (c = 
Ojj . .reduces the maximal -expected performance level to 0.6 and the maximal 
expected prXSportion of double-correct responses Lo 0.2. 

The-model is. compared to the data of Table 1 in Figure 5. The Pq' and' 
Pg ' ^functions of Figure 4 for c = 0 and c = 1 have been replotted here with 
Pq , the expected or observed pe r f orman'ce ' .1 e ve 1 on the abscissa, in order to 
facilitate comparisons with r&al data. The long curvilinear functions are 
for c = 1, the short linear functions for c = 0. Functions with c between 0 
and 1 lie between these two extremes, starting at the same point at the left 
and extending up to a pointon the long linear sepnents that represent the 
maximal "expected sc'ores for different values of ,c. Only the observed Pg 
scores from Table 1 are plotted. (The differences between observed and 
predicted scores are exactly twice as large as those between observed and 
piedicted p£) scores, and therefore make discrepancies easier to see,) 

El' -r of two conclusions , can be drawn from Figure 5. If all data 
points are to be fit by ~a single function (and it seems that they could be), 
then' the model is incorrect, for it cannot generate this function. On the 
other hand, it is possible that different experiments', stimuli, or groups of 
subject^ require different functions. The three data points of study A 
C St udder t^-Ke nnedy^ and Shankweiler, 1970) are fit by a function with c = 1, 
indicating virtual independence of channels. Eight of the Other twelve data 
points seejn to be fit by a function with approximately c = .3 (that has been 
drawn in Figure 5), indicating subotantial negative dependencies between 
channels. The other data points require intermediate values of c, except for 
one point that falls completely outside the range of the model. Variations 
in c as a functio^n of stimuli or subjects are not implausible. The stimuli- 
of study A, for example, were different in several ways from those of studies 
B - E, which dll come from the same laboratory. In this case, 'may serve as 
an indicator of the degVee of channel interaction (for example, fusion) in an 
exper ime nt . j 

^.Wh.ije furjther research will be required to evaluate the usefulness of 
Che present mo;del for making global predictions, it is clear 'that the model 
is Mot" ' su t f ic ien t at a detailed level of analysis. For exaijiple, it-could not 
explain stimulbs dominance or the feature-sharing effect (see Repp, 1977), 
However, its giros s pred ic t ions are likely to be not too far from the truth. 
The model has implications for researchers who have focused on Pj) .as a 
possible indicator of a-uditory processing capacity that is semi-indepeitdent 
of overall periformahce level (Berlin, Hughes, Lowe-Be 1 1 and Berlin, 1973; 
Dermody and Noffsinger, 1976; Tobey, Culle.n, and Rampp, 1976), The result,s 
of two such stijidies iare included in Table 1 and in Figure 5. Zobe y ef a 1 . 
(1976) noted that their two groups of subjects ( ch i 1 d ren ■ wi t h' and without 
auditory processing disorders) did not differ, in Pg,' but only in Pj). 
Similarly, Berlin et al. ( 1973 ) found that Pj) increased with age, while Pg 
decreased, but Ito a much lesser extent. As can be, seen in Table 1 and the 
figures, both findmgsS are pred i c ted by the present model. .The subjects in 
both studies performed at relatively low levels, where Pg is nearly constant 
v'ith changes in performance level. Therefore, the findings should bo 
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ascribed to changes in (')Vorall performance level, not to any spociEic factor 
reflected by'Pj). alone. 
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Acoustic Correlntes of Pcrctnvod Proininonco in Unknown Ut torances^'^' 
Jane H. Gnitenby .nnd Paul Mermelstein 



ABSTRACT ' . 

Intensity, fundcnmental frequency, and syllable duration all 
contribute to the perception of relative prominence among the 
syllables of continuous speech. These acoustic parameters were 
studied for their relative ability to predict- syllabic prominence 
in a corpus of 24 imperative sentences by four talkers. The 
^sentences were constructed with controlled syntax and 1 imited 
vocabulary, as may be the case for speech communication with 
machines. Of the individual prosodic parameters', the best predic- 
^ tor of perceptual prominence was the maximum . frequency-we ighted 
intensity value for the syllable, relative to the maxima of the 
neighboring syllables. Duration and. fundamental frequency were 
significantly poorer prominence predictors. A linear combination 
of relative intensity and juration was the best mul t i-parameter 
predictor. In polysyllabic words, perceived relative prominence 
ratings agreed with the intrins^ic lexical stress patterns in 
essentially all cases. When prominence was predicted from relative 
intensity measurements, it agreed with the lexical stress contours 
'for 90 percent Q.f the words; combined relative intensity and ^ . 
duration brought the agrJ^ement to 92 percent. 

IN TRODUCTION • 

Prosodic features structure the speech signal at -^.the supras-^.grnen tal 
level. They serve to organize sequences of syllables into words and phrases. 
Lexical stress is an important cue to word identity, and au t oma t ic : s t t e ss 
indication for hypo the s ized syl lab le sequences can be expected to assist in 
the determination of the corresponding word sequences in .speech recognition. 

Durat ion , intensity , and the fundamental frequency contour have been 
previously suggested as acoustic correlates o f ' 1 ingui s t ic stress (MoT and. 
Uhlenbeck, 1956; Bolinger, 195.8; Lehiste and Peterson, .1959; Lieberman, 
1960). Since stress or prominence, judgments can be considered to be 
associated with the individual syl.lables, it is of interest to obtain 
syllable-based measures for the prosodic parameters. If prominence predic- 
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tors can bo based on s ingle Tnensurement s por parameter rather than parameter 
contours, a significant data-rediict ion can be attained. Such measurement is 
would truly reflect the suprasegmental aspects cff the prosodic prTrameters if 
based on automatTcally derived syllable-sized units without regard to the 
segmental structure of such units. In the experiment to be described, we 
have characterized the duration, intensity, and fundamental frequency con- 
tours in terms of differences between adjacent syllables in the duration of 
the voiced subsegment of- the syllable, the peak frequency-weighted intensity, 
and the peak fundamental frequency; and have determined the effectiveness of 
these measurement S', individually and in combination, in predicting syllabic 
prominence' and lexical stress for a limited amount of speech. 

' The relative importance of the three acoustic correlates in signaling 
stress in English has also received much .attention. Conflicting claims 
abound, perhaps due to the different typep^of speech materials studied by the 
various investigators (Fry, 1958; Lieberman, 1967; Lehiste, 1970). 
Information concerning the phonemic content of acoustic segments is frequent- 
ly signaled by a number of d i s t inc t , ac ous t ic features. Some .features are 
necessary, others are optional. In the appropriate- context, and when 
appropriate values are assigned to '^ill the other -features,- variation of the 
'•value of each optional feature generally suffices to changp the phonetic 
identity of the segment. It is. not surprising that a similar situation is 
found for prosodic features. Sometimes one feature carries a heavier 
information load, sometimes another. 

This paper is concerned with the acoustic correlates oi syllable 
prominence ( inc lud ing 1 ex ic a 1 stress) in speech spoken with a limited 
vocabulary and with controlled syntax, as if to a speech-understanding 
'automaton. Speech-understanding systems, for the foreseeable future, will 
not be able to recognize and respond to utterances selected from a natural 
language in its entirety.' .The necessity of controlling the vocabulary and 
syritax of the acceptable utterances will impose its own influences on the 
ptosody of the spoken materials. The reported expeiriments therefore analyze 
the acoustic correlates of prominence. in just such utterances. 
Generalization of nhe results to other modes of speech--such as fluent 
conversation or script readings — may not be warranted. • 

Background 

Prosodic features have fiot yet been widely exploited for purposes of 
automatic speech recognition, although suggestions have been frequent that 
such features would. prove useful. This lack of exploitation is due'Xtp the 
variable and intricate nature of the prosodic parameters in continuous 
speech, and to Ihe consequent comparative 'rarity of publications that 
quantitatively describe 'intonation, rhythm, and rate in extensive sp ich 
sampl es . 

Progress has been reported, however, in acoustic detection of stress and 
closely associated phenomena such as juncture.- An outstanding example is the 
Series of studie^s pf American English prps.odics begun by Medress, Skinner, 
and Anderson (1971) and continued by Lea and col league s ( 1972 , 197^3a, 1973b, 
1975, 1976a, 1976b). Lea's latest, report concludes that stress is- best 
determined by combinations of prosodic cues, of which long chunks of high 
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encTgy and IoCcjI incronsos in tundameiU.i 1 Iroquoncy aro tho most oifi^ftivc. 
Lea has reported .rosul l.s rnnginp; Lrom 63. percent to 92, P9^c=)nt correct 
location of perceived stresses, depending oit the corpus exa^nined . 

In the same study, Lea reports that the " primary simple cue s t_o str ess 
are (in order of increasing effectiveness): high intensity in the stressed 
vowel, long durations of stressed vowels or syllabic nuclei, and high Fq 
[fundamental frequency] values in the stressed vowel." Lea is not explicit 
on the details of his stress detection method, but it appears to be based on 
absolute measurements. One result of the study to be described here ^n which 
relative values are used is that the reverse ranking was obtained. 

Evidence was presented by Gaitenby (1974, 1975) that lexical stress in 
fluently-read speech is, in the majority of cases, predictable by summing 
weighted syllabic data for peak frequency, intensity, and duration of 
voic''ig. Although the summation method appeared to have promise for 
automatic, stressed syllable location in words and phrases, a prerequisite to 
its .use, as Gaitenby implied, is the creation of an algorithm for detecting 
syllable boundaries. Mermelstein "(1975) demonstrated that automatic 
segmentation of syllables in continuous speech was feasible with small error 
rates. This suggested that automatic prominence indication might possibly be 
attained through assignment of an integrated prominence measure to the 

individual syllables. ^ , 

.■ .? . 

The present experiment was undertaken to examine .further the effective- 
ness of individual and combined prosodic parameters in locating, stressed 
syllables. An additional' consideration was an attempt to apply, syllable 
segmentation as the first stqp in arriving at reliable automatic prominence 
detection for speech recognition purposes. 

METHOD 

Speech Ma ter ials • ■ 

In order to record samples of speech that more closely resemble 
spontaneous utterances than 'do direct script readings, we instructed a group 
of talkers to create sentences for themselves, although constraints were put 
on the form and content of the ir u 1 1 erance s . Each t^alker was given a st'ate- 
transition chart constraining the syntax and vocabulary of the sentences to 
be spoken. This diagram (shown in Figure 1) confined the syntax , so that the 
utteranc-^s resembled commands" that may be- given to a computer-based Tobot . 
The vocabulary was correspondingly limited. Each talker was required to 
construct his or her sentences by reading left to right across the diagram, 
selecting wo.rds from successive columns. The talker was instructed to speak 
each selected sentence aloud a time or two, as . rehear sal , and then to deliver 
the sentence for the actual recording without referring back to the. diagram. 
We hoped fhat these* instructions would result in some degree of spontaneity 
in the recorded sentences. * ^ 

Four talkers were used, two men and two women, all native speakers ot 
American English.' Each talker recorded a minimum of six sentences in a 
single session. Twenty-four sentences were selected for a^nalysis, four by 
Talker 1 , six each by Talkers 2^gnd 3, and eight by Talker 4. 
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Figure 1; Syntax and vocabulary guide for sentence productions. -The accompanying 
instructions were: 1) Assemble a sentence by reading, aloud from left to 
right, following any connected path; 2^ Rehearse the sentence aloud once 
or twice; 3)-Say the sentence in a, normal raanner. (For subsequent 
sentences, repeat the process.) 
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Perceptual Mea surement s 



Perceptual prominence judgmentfl wore collected to provide n standard to 
which the acoustic measurements 'could be compared. Th.c f ,'ipi<d sentences were 
presented to listeners in the fpllowing formal test of pt iWived prominence. 
The subjects were told to mark the more prominent syllable of each pair of 
adjacent syllables (syllable A vs. D,,B vs. C, etc.) in every sentence. 
Although individual subjects were required to make binary judgments of 
prominence, indications of intermediate stress emerged in the pooled results. 
This approach to listeners* stress judgments had been useful in a test we had 
made in 1975 (unpublished) and a similar method was used by Lea et al (1973b) 
in evaluating perceived stress., (Stress and prominence may be considered 
equivalent: terms when judgments are made between contiguous syllables of 
running speech.') ' 

Five listeners were chosen from the laboratory staff. The test was 
taken by each subject individually, using headphones in a quiet environment, 
proceeding at his or her own pace. Each received a typescript of the spoken 
sentences on which the overlapping sequential syl lable pairs were. indicated , 
for example: [damaged, mager^ ta, table]. Listeners were instructed, in 
Listening Test 1 , . to write down the syllable heard as prominejit in each pair. 
Theyx were allowed to play the 24 tc-ped sentences in any order, and to listen 
„as many times as necessuary to arrive at judgments. Four out of the five 
subjects finished the test in less thai, an hour, at a single sitting., None 
of the , five found the teat difficult. An additional listener had found the 
task impossible, and was riot includ^ld in the final group. 

To*' establish the consistency of listeners' prominence judgments, the 
experiment was repeated. Li stening Test 2, the^ same. as Test 1 except that 
the prominent syllable had to be check-marked rather than writtei;^ out, was 
presented to each of the subjects between a week and a month after the first 
test,' The results of Tests 1 and 2 appear as Appendices lA and IB, IC and 
ID. 

In both tests, the initial and final syllable data in all sentences were 
doubled to compensate for the fact that syllables in those positions could 
receive only half the number of. judgments received by the remaining syll- 
ables. The maximum number of votes that a syllable could receive in either 
test was lb, resulting from two comparisons, one with the preceding and one 
with the following syllable, by each of the five listeners. Ninety-five 
percent of the 274 syllables received the sam^ number of pooled ^ prominence 
judgments in Test 2 as in Test 1 . For. only four syllables did as man^ as 
three judgments (out of 10) shift in the second test. The consistency of 
invididual listeners in making judgments ranged from 87 percent to 91 
percent, averaging 90 percent. In . the pooled, results of^ both listening 
tests, more than half (54 percent) of the syllables received unanimous 
y/'dgments. The extent of inter-listener agreement is illustrated by the 
overall correlation between the most and least consistent listeners: 0.86. 
Judgment consistency was significantly higher for the speech of Talker 1 and 
lower fojr Talker 2 than for the two other speakers. Similar talker 
differences appeared in the correlations of the acoustic parameters with the 
perceptual data, as shown in Table 1. The strongest conflicts in judgments 
occurred in syllable pairs having comparable potential for prominence. 



Exnmploi^ arc: tnp bmc, bo mvitb i|JJ.) <}H surfaces, in which irUn.Tpntly 
stressed syl lables :ibut: --iintl pam phlets in, over a, un der the , that are pairs 
of normally unstresi:?tHi syllables. The majority of ■ these "high cunflict" 
syllables either l.lanked a pause or were approximately equal in duration, or 
both. 



TABLK 1: Correlaf.ion cceifieionts for 
prom inence . 



Talker 1 

No. of Sentepces 4 

No. of Syl lables 47 

Relative Intensity 0.76 

Relative Duration 0. 70 

Relative Frequency 0.44 

Relative Intensity & 0.81 



Relative Duration 



acoustic parameters and perceived 



2 3 4 ^ ^ Overal 1 

6 6 8 24 

69 63 95 274 

0.70 0.74 0.70 0.70 

0.44 0.53 0.45> 0.52 

0.21' 0.38 0.55 0.38 

0.73 0. 77 0.77 0.77 



Acous tic Measuremen t s ^ 

The data were automatically segmented into syllable-sized units, uning 
minima in a frequency-weighted intensity function as likely syllable boundar- 
ies (Mermelstein , 1975). The intervals within the syllabic units manifesting 
voicing as evidenced by a significant amount of low frequency energy (0~300 
Hz) were next delimited. CIc should be noted that the, " syl lab ic un i t" is no;: 
necessarily exactly equivalent to the perceived syllaU.le where phonological 
and lexical criteria may play a significant? role.) We attempted to weight the 
intensity function when integrated over frequency so that it approximated 
perceptual loudness. The weighting function was flat between 500 Hz and 4 
kHz and dropped off at 12 dB/octave outside these frequencies. The maximum 
of this weighted intensity function over the voiced portilon of the syllabic 
unit was assigned as the peak intensity of the sylLable. Fundamental 
fre.quency values were computed for voiced intervals using an autocorrelation- 
based pitcli extraction program (Lukatela, 1973), and the peak frequency for 
the syllable was determined. The algorithm-based measurements were cross- 
checked against wideband spectrograms of all 24 sentences generated with the 
^Digital Pattern Playback (Nye, Reiss, Cooper-, McGuire, Mermelstein and 
Montlick, 1975). The spectrograms, each hard copy displaying 6 sec of 
speech, were augmented by frequency and weighted intensity curves. 

Up to this stage, data were collected without knowledge of the specific 
verbal content of tlie. speech material. To correct possible syllabification 
errors output by the segmentation program, the recordings were th(^r\ listened 
to. It was found th£ t t^ie algorithm had successfully detected 93 percent of 
the 2 74 s yl 1 ab 1 es . The errors were the following: 9 cases in which one 
syllable was subdivided into 2, one case of 1 syllable subdivided into 3 (in 
IbleidJ of "razorblade") , and Seven tvo-syllable sequences that were not 
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•(Iivicli.»(l . Tl)(^ inor.t troqiiont c/ius(^ ol morc^ syllnblos hflviin^, boon indicntiMl by 
the pro>;,rcim than wor(> htNird, was thv proHtMice of a clcwir dip in int(^nsity 
within .n diphthpng. To-^ lew fiylliiblos hiul boon indicated most: oIIcmt in two- 
syllabi o soquoncos in w h at loant ono syllablo was unstrossod, and in 
which the phone at, the cot.iwon bovmdary between tl\e eyllablcs was a Bemivowol 
( /r/ or /I/) or sy 1 lab ic^ /n/ . These errors were .hand-c orrec Led . In 
addition, syllabic units shorter than 51.2 ms were discarded, being deemed 
too brief to have syllable status in the particular utterances. 

The acousti^c measurements were converted to units similar to those^used 
in Gaiteriby's 1975 study, namely^^ peak fundamental frequency - 4 Hz, 'dura- 
tion of voicing - 12.8 ms, and intensity - 1 dB. The parametric data, 
in these units, are given in Appendices 2A and 2B, 2C and 2D. [Talkers 1 and 
2 (the. female speakers) had fundame^ital frequency ranges that were much 
h ighei;: than, those of the male talkers. To take this into considerat i^n , the 
Fq data for the lowest peak in* each sentence became the baseline for the 
frequency measurements.] 

RESULTS 

Correlation coefficients between each measurement and the perceptual 
prominence scores were selected as indicators of the effectiveness of any one 
measurement in predicting perceptual prominence. (A preliminary result wcis 
that absolute intensity predicted prominence at approximately the same rate 
as the, parameter summing method mentioned in the Background .section of this 
report. The correlation coefficient for absolute intensity and perceived 
prominence was 0.34.) Since the perceptual judgments were determined rela- 
tive to the prominence of thii neighboring sylTables, the intensity, frequency 
and duration measurements were converted to relative measures through the use 
of the following local difference function on groups of three consecutive 
syllables: 

where M^^ is a relative measure for syllable < x (peak frequency, peak 
intens»ity, or duration), and M^^ is the absolute measure for the same 
variable. For the initial and final syllable we used 



Mx = 2(jtJ^ M^+i) 




and 

-Mj^ = 2(Mj^ - Mj^_i ) respectively. 

The resulting correlation coefficients were given in Table 1. It is 
apparent that of the ^ individual measurements, relative intensity is the 
single be s t / corre la te ' of relative prominence. In terms of- the overall 
results, duration and fundamen^tal frequency are the worst correlates, in that 
orderT' However, when we look at the co.r relation coefficients for xfhe texts 
of individual talkers, duration is less highly correlated with yprominence 
than frequency for one talker. There appear to be s ign i f ic an t d i f/ierence s in 
the way the various talkers encode the prominence information in term.s of the 
three prosodic parameters. 
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Wi' lU'Xl i nvi'sl i >>.iU'(l wtu't luT .1 1 iiu'.ir t'oinb i n/it ii>n oj tiu^ two inoflt" 
vlh?ctivt» nnivimotiTH wouhi proV]<' nun-*' uSiMul thjii oiMuT of tinMiwilonr. Tin- 
two b(?jit iiuiividu.il p»ir/\miH' r rs , r»»l/itiv(» ii\t*^nHity .ind (iurcUion, fihowiul a 
cori'fl iit ion coi' 1 t 1 c i <Mit: of ()..) with V(?spt»ct to o.'icli othi»r. ■ IWuuu.i (in multiph/ 
rt?^»rL»ss ion Loclui t (|in» s (McNiMu.ir, I9(>^n» thi» n^wuU^itn' bost i» m t: i m;ic or lor 
prominv^nci^ from i-oLitivo inttMisLty and ''duiwi r i on w.'is cli»tonni mul t'o be 

IV,, = 0.59 Ii-.i ^ 0.32 1)^,1 , 

wbero I^^^j nml 1^*^. 1 nri* tho r (» 1 ni: i vo i nt ens i ty »'Jnd duriition n\(»nHuromont s , /md 
^cst ostinu'itiHl I'l'lativo proin i ncMic 0 . The corrol/ition of thin n(?w 

ostimnte witli thi» prominonc*? iudgnuMits w:is 0.77. 

To judge tho o f f oc t i V(?no ss of the above corrolntion figurOj we at temp ted 
to dotormiuo the d is^igreomen t Cq be expocted l^etwcen the prominence ratings 
of diftorent listeitcrs. It is unlikely that the agreement between the 
o ve ra 1 1 prom in one e j udgmen ts and that predicted Crom acoustic mea suremcn t s 
can exceed the agreement between prominence judgments of individual lis- 
teners. Since oacli listener j>idged tlie spoken' data twice, consistency 
measures were available on the judgments by each listener. The mast 
consistent, and least consistent listeners were selected to illustrate the 
range ot judgments one can exj^^fct. The overall correlation between t;he 
judgments of these two subjects was 0.36. This figure, then, represents a 
rough upper limit to whicl| the correlation between the best estimate of 
prominence and its judgment may be compared. Evidently, relative intensity 
and duration are quite effective when used in combination to prerlict relative 
prominencoii Relative intensity alone is slightly poorer, 

-Ci DISCUSSION 

The results of the correlation analyses show that the ranking of the 
single parameters as prom inence -cues is intensity first, duration second, and 
fundamental frequency third--the rever se ^order of that found by Fry ( 1958) 
and Lea (1976a). A, main difference between this analysis and those, aside 
from the type of intensity measurement, is that the present data are'-^values 
relative to the adjacent syllables. Another differen<:e is that maiiy of ojr 
sample sentences were delivered rather slowly and hesitantly. The sizable"* 
pitch excursions 1 a-.id strongly contrasting durations that' may sometimes 
'accompany fluent ^speech tend to disappear in utterances thtit are hesitantly 
or cautiously produced, with rhythmic phrasing lessening as stress tends to 
be applied more evenly to all of the words in slow fTpeech . When frequency 
and duration are "r .ider-u sed " as cues to prominence, the reliability of- 
intensity as *a stress signal may inicrease due to the well-known trading 
effects among the parameters. 

Perceived relative prominence ratings agreed with i^ntrinsic (that is, 
dicLionary) lexical stress in essentially every case. For these multisylla- 
bic words we attempted to predict leVical stress from acoustic measurements. 
Relative intensity correctly indicated lexical"" patterns in 90 percent of the 
(multisyllabic) words. Two words were responsible for '5 of the 8 errors 
tound: "beneath" and "maroon." Since /i/ or /u/ appeared in the stressable 
syl lab 1 e'' o "~ rh-f--^ lwo wor-ds, normalization with respect to trie - average peak 
intensi ry i^jun.r' ^or Lhe;^- vovel s was tr ied , but' without signi f icant re sul t . 
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It was noted- that error words "beneath'.' and "maroon'-'- usually preceded a pause 
and were accompanied by a fall in peak Fq and a large rise in duration 
relative ■ to the preceding syllable. Duration was plainly the solitary 
suprasegmental feature signaling prominence in those cases. Also noted, in 
passing, was the' fact that both of these words and "settee," another word 
prdducing an error, are intrinsically stressed on the- final syllable. 

When the linear combination of relative intensity and relative duration 
was. used as a predictor of major lexical stress, the number of erroir.s fell^to 
6; and in predicting secondary stress, one error- occurred m the wOrd 
"razorblade." In total, the speech sample contained polysyllabic words (37 
•different words, including 5 in both singular and plural forms). There were 
11 trisyllabic tokens and 66 disyllables. Using the intensity and duration 
combination for stress prediction, the words with errors were , as before, all 
disyllabic: "beneath" (twice), "under" (twice), "any" and "settee. Again 
the errors occurred often in prepausal words containing the phone /i/ m the 
normally stressed syllable. Four out of 6 errors were in function words. 
All error words occurred late in the' .sentences . It was noted too that the 
meaning of 'both "beneath" and '"under" is a factor that might influence 

their prosodies to some extent. Combined intensity and duration, predicC<=d 
the lexical stress, patterns in 97 percent of the polysyllabic coritent words 
and 69 percent of the function words. For all polysyllables, the intensity 
duration -combination's stress prediction rate was .92 percent. -This figure 
equals the highest predict ion "rafre for perceived stresses achieved by Lea 
( 1976a) and by .Sargent ■ ( 1975) . Our 2 percent gain in overall stress 
prediction, achieved by the inclusion of duration as well as intensity data, 
may be hardly worth the increased complexity of the algorithm. Alone, as has 
been shown, frequency-weighted intensity is a highly reliable stress predic- 
tor. ■ . ! 

Lea reported elsewhere (1976b pp. 6-8) that his approach had succeeded 
in detecting syllables at an 81 percent rate in a corpus that consisted of 15 
statements, questions, and commands, and that 63 percent of the stressed 
sylUbles had been located correctly. The 24 -sentences we have examined here 
are comparable in length to those used by Lea, but represent only commands, 
and might be considered moVe simple in syntax. A precise qfompafison of 
results is therefore difficult. Nevertheless, our overall syllable detection 
rate was 93'percent, and- the correlation of 0.77 between relative prominence 
predicted via intensity and duration, an^ perceptual ly judged^ suggest s ' that 
85 -to 90 percent" of all syllables //udged prominent wouAd be located. 
Automatic stress assignment requires ^le construction of decision rule 
based on acoustic measurements such as 'those used here. /' In polysyllabic 
wotds the simplest rule is to assign major lexical stress t^o the syllable in 
the word found mos t prominen t . For monosyllabic words, a pimple threshpld on 
the relative prominence may suffice for stress assignment. Prominence 
measures can, additionally, serve to predict the clar.ity- with which the 
acoustic information can be expected 'to be manifested. 

' \ . 

A few peripheral observations about the sample sentences are worthy of 
.mention. First, the limited syntatic structure used in our sample sentences 
was meant nof only tc resemble commands to a robot , but also to reveal 
structural relationships with prosodic featuresr So far, aside from certain 
long pauses, evidence of regular prosodic reflections of syntax has not been 
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found in the data. This result may not be surprising in view of the fact 
that, only "syl lable peak data were examined and the utterances, were predomir; 
nan tl y si ow . " - ' 

Second, pause length was extremely variable, both within and across 
talkers, ranging up to 1,8 sees. Seventeen of the 2A sentences contained at 
least one pause, and Talker 3 produced five of .the sentences lacking any 
pause. If more than one pause occurred in a sentence, the first 'was the 
longest. "^Most of the pauses took place after the first noun (which preceded 
the adverb?.al phrase) and thus appeared to ' have synt ac t ic relevance. No 
pauses occurred at any earlier location in a sentence.. There were a few 
cases of pause introduced^between an adjective (Or article) and. the final 
noun. In this position the word "a" was pronounced [eij and "the" became an 
elongated [Sa] or Isij. Such hesitation effects had several possible causes: 
the spatial design of the diagram given to the talkers as a guide , to their 
produc t ions , the res tr ic*'ed voc abul ary , and the cons tra^in t s o f the speaking 
task as a whole. Average time intervals between stressed • syl lables and pause 
length showed no dependable relationships. 



Finally; Table 2 shows the intensity and frequency ranges for the four 
individual talkers. The voices of the female speakers were "typically" high; 
the men's were low. The women displayed qot only a larger frequency range, 
but also a smaller intensity range than that of both men. As expected., the 
range and ratios of voicing duration were similar for all four talkers. 
Generally speaking, there was as much variation, in the absolute duration for 
a given word within a single talker's speech as there was across the talkers. 



TABLE 2: Ranges of intensity and frequency, by Talker 



Talker ^ Intens ity Frequency 



#1 
#2 

#4 



Femal e 
Male 



11.2 dB 
12.8 
14.7 
16.3 



108 Hz 
110 

72 

83 



CONCLUSIONS 



Relative prominence- of. syllables in continuous speech may be, predicted 
from syllable-based measurements with a reliability approaching the agreement 
between individual listeners. The most and least consistent listeners in the 
perceptual test of prominence showed a mutual correlation of 0.86. This 
figure is a standart against which the acoustic predictions of prominence may 
be evaluated. Of the three individual prosodic * parameters, a rel^^^tive 
'measure of spectrally-weighted, intensity correlates most highly (0.70 over- 
all) with perceived prominence - in the (mostly) slow speech sample. Syllable 
prominence is predicted more closely, howev'er, by a combination of relative 
intensity and relative duration of voicing, with an overall correlation of 



0.77. This combination predicts lexical stress in 92 percent of polysyllabic 
content and function words. 

The sample we have d iscussed inc ludes utterances bv only four talkers, 
two men and two women; therefore the observation made on male versus female 
prosodic differences can be considered only suggestive/ The ^mpl icat ion ..from 
our very limited data is that female speakers have both a wider frequency 
range and a narrow-er intensity range than males. Further research is plainly 
needed on prosodic differences in male versus female speech. One question 
is: to what, extent are these differences affected by soc io-1 inguis t ic 
factors? 

Research is also needed on the extent to which the use of the separate 
prosodic cues to prominence change with increasingly rapid s'pe.ech for a 
variety of Speakers. A persisting related question is how speech material 
and other factors influence speech rate and segmental duration. The depen- 
dence of vowel duration on speech mode , for example, is highlighted in Harris 
and Umeda ( 1974) where it is concluded that t;he role of prosody seems to be 
very different in carrier phrases as opposed to connected text. 

The present" resul ts provide further quantitative evidence that different 
talkers may use their prosodic resources for prominence in different ways, 
some using more intensity or frequency variations, others, more 'durational 
cues." Lieberman and Michaels '( 1962) have made a similar' o'bservat ion , that 
individual talkers show prosodic differences in expressing emot ional at t i-. 
tudes. Nevertheless, in the present speech sample, the pattern of relative 
intensity is the single/ feature . shared dependably by all four talkers in 
signaling prominence. Carefully- spoken sentences — like those examined in 
this report — my be the most recognizable form - of connected speech input to 
computers for some tine. We suggest the use- of the simple frequency-weighted 
intensity measure for prominence prediction in this type of man-machine 
communication task. 
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;^1S1 Select eac^ eroser do^n heneatn all^these seven desks, 

2 o" 10 0 10 0 10 0 7 i ^... i 9 

Tot," 0 11 ly 0 20 0 ?o 0 u .17""' J ir o-ie- 

T1S2 Repair the flan-aijed pairphlet above tne eleventn cnair, 

1 0 10 0 ' 10 0" VJ 0 10 iJ 1 iO 0 b 



10 0 ■ 10 0 10 5 . 



•Q 10 1 10 0 3' 



Tot'. 0 20 '0 ?0 0 PO 10 0 ?0 8 2 20 0 
TlS) Take tlie apple up frofi, a big box, 
I ') ■ 0 10 0 'J '6 0 9 i 



Tot. -W- 0 20 0 18 11 1 ,16 4- 
TLS^ Move the phonebooKs out beneatn $even shelves. 

■ '5 0 10 u 10 a B T 0 5 
y 0 10 ,0 10 a 7 3 0 5 

Tot. -to- 0-20 0- 20 0 15 IM 

T2S1 Find a daniBfl;ed ra^orblade dov^n beneath all sevei^ desks 

^ 0 10 0 10 0 5 ' 10 0 3 9 6 0 5 

i 0 10 0 10 0 ^ 10 0' 7 7 ' 6 0 5 

■ ■ 20- ■ ?0 
Tot. -10- 0 20 0 ?u 0 10 20 0 12 l6 1? 0 -10- 

T232 Rcftove each eraser Up fron, any maroon box. ' 

0 8 7 0 10 0 9 2 9 ^ 1> ' ^ 

G 7 (J 0 10 0 10 0 10 M 6 Ji 

16 

Tat. 0 lb li) .0 20 0 19 2 I') y p 12 4- 
TO) Paste so(r.e booklets up under each maroon shelf. 

. p 0 10 0 8 7 0 10' 0 6 1| 



5 

2Q 

Tot. -19- 



0 10 ■ I 8 t) ■ 0 10 C t; 4 

■ 16 

0 20 I iD.lj 'j 20 -0 12' -8- 



•Test 1 

" i 
m. ■ 

.T2S5 

i ' 

Tot. 

T2S6 



5 0 10 0 10 0 . J I 10 0 
20 ' 

49- 0 ?0 1 17 £ ' 19 1 2C ■ 0 

Pick each casebook d?wn under each big box. 

3 ■ 2 10 0 '^^ 0 0 7 ^ > 



5 1 9 0 9 u 0 8 y 5. 
16 Ifc 
-8- 3 19 ,0, 18 1? C, 1!) 7 



Select some erasers down from a big shelf. 
0 9 5 0 10 0 10 h \ 1 ") ' 
0 9 5 1 10 0 10 ') 2 6 il 
Tot. 0 18 11 ■ 1 ?0 0 n , 7 ) 
TJSl Repair a deniaged rezorblarie in the seventh box. 
' 0 10 d 10 0 10 0 ID 5 0 10 1 

0 10 0 10 0 10 0 9 5 1 10 1 ^ 

■ : . ■ 16 

Tat. 0 20 0 20 0 20 0 L9 10 . 1 20 2 
||r3?2 ,Selecfiin epple fron ell Ecvcn shelves.-. ■ 

' 0 ,10 0 10 1 1 9 6 , 0 5, 

'O 10 0 10 1 'I 9 6 0 ,'; 
' 20 
0 20 0 20 2 a 18 12'' 0 -19-' 

Paste son.e books down' onto all surfaces.: 

5 0 8 7 "1^ 0 8 7 0 5 

5' ' 0 8 6 6 6 9 1 t ' 
PO 18 
Tqt. -i9- 0 16 U 11 0 li^ 16 1 -9- 

ir3Si|\ Hide a razor in the top roon. ■ t 

: % 0 IC J 7 ,0 7 ^? 

.. , .<) 0 10: 2 7 1 7 ' 3 • . 

, " 2U , 12 

-19- 0 20 5 lit 1 1'* -fe- 



Tdt. 



Tot 
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I3S5 Get t!ie apple down frot. the top corner. 

Tffit I 5 o' .. 10 0 10 li I 6 9 0 , 
•■ 2 5 0 10 0 10 ^5' 0 5 10 0 

.19- 0 , 20 ^ 20 9 1 11 19 0 . 
T336 pick seven, telephones up froir, any big room. 

1 9' 0 , 10 0 6 9 2 B 1 'I 5 

2' rj 0 10 0 6 9 ^ 90 5 5 ■ 
• 6 20 
Tat. -3- 17 0 20 0 12 18 3 17 1 9 -W- 

T^iSl Paste each dacaged casebook up over a stelf . 

0 ■? 6-" 0 ' 10 0 9 6 't !■ 5 

.2 .7 t 0 . 10 0 9 6 1 i 3 
l( ■ , ' ?o 

Tot, -2- ID ■ 12 0 ■ 20 Q 16 12, 5 3 -10- 

'r'l32 Get seven erasers out ben,eath the settee. 

19 1 It 10 0 10 0 10 0 6. \ 

0 10 2 ) 10 0 10 0 10 0 5 5 : 

2 18 , 
Tot. -1-19 3 7 20 0 20 0 .20 0 11 -3'' 

T^ISJ Reiove each damaged apple out under the liottom 

0 -j ?■ B 0 10' 0 10 3 0 5 9 1 

0 5 7 8 0 10 0 9 3 ,1 5 10 0 

Tat. 0 10 I't 16 0 20 0 19 10 .1 10 ' 19 1 

T^S^t Destroy red panphlets In the sei/en books, 

.07.67 t 5 1 10 0, 5 

"0-6 6 8 ' 2 7 1 10 0 5 

Tot. 0 13 12,15 6 12 2 20 0 -19- 

T^SS Hide the telephone books \ip under the 'corner. 

5 0 10 0 6 5 ,9 3 1 10 ■- 

J .0 10 0 6 4 10 5 2 3 10 ", ; 

Tot. -i9-. 0 20 .0 12 9 19 10 3 7 20(SyU. 



table. 

10 0 
10 0 
20 0 



lost) . 



.T^iSo ^Select eacfi dawged razorblade In every aiirfacej 
Test 1 0 5^, 9 70 90 10 1 9 0 10 0. 

" 2 0 5 9 6 0 10 0 10 0 10 0 10 0 
TQ,t. 0 10 IS 13 0 19 0 20 1 19 0 20 0 

Til37 Put the dairaged table up onto the surface. ; 

5 0 10 p 10 0 8 7 1 10 0 

5 0 10 ' 0 10 1 8 5 1 !l 10 01 
20 ■' , 

Tot. -19- .0 20; 0 20 1 16 13 2 B 20 0; ■ 

TIISB Hide some red apples up under!, the basketls. 

:' 5 0 7 8 0 9 6 0 3^ 10 0 

5 0 6 9 0 9 6; 2 3 10 0, 
20 ■ ■ 

Tot, -18-, -0 13 17 0 18 12; 2 ■ 8 20 Q 



•; ^ 



I 1' 
I' 



I 



Appendix iC , •,' . ■ . , Appendn ID 



km 


Select m\\ eraser. down beneath all tnese seven, desks. 


. 1234 


Hi'Jethe apple to the lowest surface. . 




?.k 20 ?k ih 17 IJ 15 20 Ih 13 12 2} 12 


le 


■ F 


18 11 ' I'l 111 11 5 7 7 7 25 


Dur. 


71; 1710 16 20 28 11 '22 " 22' 16 10 16 


15 


■ D 


21 10 13 17 13 10 18 B 10 l| 


■ Int. 


II I? 0 9 I 7 5 3 8 150 


k 


I 


17 11 13 4 8 13 14 13 7 0 


' . T1S2 


Repair the. dairiaged pair.phliet above the eleventh Chair.. 


;T2B5 


Pick each caseboolt down under each big box. 






10 


,p 


46 4? 4i 24, 31 30 28 3 31 2 


Dur. 


12 34 \h lis 17 18 ' IP 10 2h 26 6 p6 11 


C V 


,D 


7 10 1? 12 22 l4 10 14 18 17 


Int. 


6 12 9 3 6 8 6 It 10 507,0 




:I 


.8 6 10 12 12. 5 5 0 6 , 9 ■ , . 


I1S3 


TaKe the apple up.froir a big box. 






Select I'onB erasers down frop a big 3hel|f. 


F 


29 32 15 l'^ IDIO' 15 19 15 






17 31 34 23 25 13 26 29 16 21 25 ' 


' D 


17 9 15:21 12 32 19 21 15 




D 


4,14' . 5 12 20 1) 35 10 18 24 18, 


I 


12 7' 3 0 6 J 1 ,7 2 ■ 




I 


0 10 5 .4 9 0 11 |4 2 ,3 ■ B; i ■ 


US'* 


Hove the phonebooks out beneeth qeven shelves. 




T3S1 


Repair e damaged razorblatfe In tlje seventh box.i 


F 


27 ,5'2' £3. 10 23 20' 13 21 13 20 




iF 


12 17 12 13 12 13 11 6j 7 7 13 1? 7 


' D 


29 7 2lJ 14 21 6' 21 9 12 e6 




> 

P 


11 23 9 2b 8 18 11 24, '12 12' 11 ,L1 21 


I 


8 6 ■ 3 8 9 5: 3 706. 




i 


* 9,13 1112 3 12 4 7i 4 ,0 11 I7 11 ; 


T2S1 


Find » (!ar.aged rajorbladedown beneath all aeven^deska. 


iT3S2 


Select an Bpple fifom al'l seven shelves. ■ 


P 


22 23 15 12 16 U 9 10 10 8 8 1 10 6 

1 


13 




9,14 2,12314 i 12 112 10 [n 


D 


18 11 W Ih 20 10 22' ?4 8 '"18 19 12 13; 


16 




6112 9117 9 li| 25 ill 12 21 


r 


10 it 8 1 90 2 1?, 5' 0 6 11 2 


11 


, ( . 


9 lb 11 1^ 8 !() 12 1I2 0 12 


T2S2 


Remove each eraser up fron any nflroon box. "' • 




k3S3 


I ' a!' .' 

Pa3te some! books down ontfj all aurfaces.! 


P 


15 21 15 2k 13 12 12. 17 6: 8 n 12 / . 




''^ . 


15' 18 17 15 8li3 8 Ij 9 7' ■ ' 


D 


10 21 12 5 17 15 11 21 10 21 9 26 l|i 




Ip 


14 19 , 17 24 17 If 11 9 (3) 61 


I . 


3 7 1 3 12 2 11 5 5 2 '0 1 '8 




ii 


9i 1 1 6 :4 3 b 5 0 0 0: 


T2S3 


Paste Eorc booklets up under each maroon shelf. [ 




:|r3s4 


Hlie a razor in tlje top doom.' 


F 


■1 21 16 5 , 9 9,7 .9 11 7' 10 ■ i 






18 5 13i 5 5 !5 11 19 


u 


13 1^ 9 1?, 9 15 10 IJ ,12 31: 16' i 




f 


18 11 24 ao 16 7 17 24 


I 


1) I? B 7 10 10 3 0 3 : 10 i 


1 


11 


■ ! 1 ! : 
18 9 13 ' 6 6 lo 15 2 * , 
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\ 

•il 

i 

f 
•J 

> 
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Ge< .th« apple doxa fron. the top corner, 

11*^ 21 9 8 11 10 , it 6 1) ^ 

13 9 19 16 ?? 11 7 16 11 b 

15 8 lb '<i 16 5 0 15 n ' . ; • 
Pick seven telephones up tm any big roor. 
Ik 15 10 16 7 8- 11 10 8 15 3 10 ■ 

9 1? I'l p 9 2'J 15 18 11 13 >^ 
15 16 9 15 15 10 15 9 5 0 'I '1 

Paste each damaged casebook up. over a shelf, 

?2 30 16 17'' 10 0 13 11 9' 7 1 

lit 14 ■ 16 16 9' 10 9 17 7 7 U 

9 16 8 9 0 12 5 2 2 7 

Get seven erasers , out bei^ath the seitee. 

1) 19 2? l6 10 3 19 8 9 069 

11 10 10 le 16 12.19 12 12 58 15 

12.18 12 1)1} 3.16 0 ll ■ ' 1 11 2 

Renove each damagSd appl^ out ui'ider the tottom table, 

12 22 21 111 13 13 M2 U ,11 5 :9 13 $ ,0 
13'21 15119 7, lUli.'17'12 .131813 1U6 10 
3;11 11 ' 16 7 I'l '1 . 17 8 ho Ik 10 5 2 
Destroy rejd pamphlets In the sev^n books,. 

8: 17 11 13 1) ll 6 S 7 . • , 
1 21 21 19 11 13! 8 12 27 11 . 

'0 14 14: 9 3 1 b ir 0 ' 5 : 

■Hide the telephone books up under the cqmer. 

, (SylU! ' 

14 1117 13 4, 4 9 5 6 3 13 lost)' , 



15 5 If 923; 12 13 24 1, h iq . 
20 1 21 21 . 11. ?o n <i n iii - 



I 



J4S6 
F 
D 
I 

T4S7 
F 
D 
I 

1438 

f' 
p 
I 



Select eacn dar-a'aed: razorblade In eyery surfaci. 



9" 13 2.1 9 ■ 10' D 4 4 ^ , 6 5 4 12 

5 14 13 15 12 Id 15 29 6 14 12 '\ 10 $ 

5 17' 7 14 5.' 9 0 5 .0 8 2 \9 0 

Put the daaaged table up onto the surfac£ 

23 14 IT.' 13 k 9 10 12 8 5 13 0 

8 11 19 9 / IT It 8 13 ' '8 7 11 8 

11 2 14 5/ 8 4 12 6 1 0 11 0 



(djvcd. 



Hide socie red' apples up under the baskets. 

16 13 i U 2 9. 10 8 . 0 8 0 : 
16 14 i6 16 15 12 V-t 8 9 16 6 



14 B 9 12 0 7 4 0 ,0' 8 0 
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